Exam 1

  1. Question

    A random procedure generated the following sample (sequence of measurements):

    𝐱=50,57,58,51,51,59,58,59,51,56\mathbf{x} = 50, 57, 58, 51, 51, 59, 58, 59, 51, 56

    You can download the data as a CSV file (for importing into a spreadsheet or R): list_counting_basic_data_92422.csv


    1. How many of the measurements are less than 56? In other words, determine #[𝐱<56]\#[\mathbf{x}<56].
    2. How many of the measurements are less than or equal to 51? In other words, determine #[𝐱51]\#[\mathbf{x}\le51].
    3. How many of the measurements are greater than 59? In other words, determine #[𝐱>59]\#[\mathbf{x}>59].
    4. How many of the measurements are greater than or equal to 50? In other words, determine #[𝐱50]\#[\mathbf{x}\ge50].
    5. If 1 of the measurements are less than xcx_{\text{c}}, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that #[𝐱<xc]=1\#[\mathbf{x}<x_{\text{c}}] = 1.
    6. If 6 of the measurements are less than or equal to xcx_{\text{c}}, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that #[𝐱xc]=6\#[\mathbf{x}\le x_{\text{c}}] = 6.
    7. If 2 of the measurements are greater than xcx_{\text{c}}, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that #[𝐱>xc]=2\#[\mathbf{x}>x_{\text{c}}] = 2.
    8. If 4 of the measurements are greater than or equal to xcx_{\text{c}}, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that #[𝐱xc]=4\#[\mathbf{x}\ge x_{\text{c}}] = 4.

    Solution

    First, it helps to sort the data. sort(𝐱)=50,51,51,51,56,57,58,58,59,59\text{sort}(\mathbf{x}) = 50, 51, 51, 51, 56, 57, 58, 58, 59, 59

    You could use R to sort the data:

    sort(c(50,57,58,51,51,59,58,59,51,56))
    ##  [1] 50 51 51 51 56 57 58 58 59 59

    You could also use a spreadsheet. Import the table into a spreadsheet. Then, highlight the column of measurements and use Sort function.


    1. There are 4 measurements less than 56.
    2. There are 4 measurements less than or equal to 51.
    3. There are 0 measurements greater than 59.
    4. There are 10 measurements greater than or equal to 50.
    5. Measurement value 51 has 1 measurements less than it, so xc=51x_{\text{c}}=51.
    6. Measurement value 57 has 6 measurements less than or equal to it, so xc=57x_{\text{c}}=57.
    7. Measurement value 58 has 2 measurements greater than it, so xc=58x_{\text{c}}=58.
    8. Measurement value 58 has 4 measurements greater than or equal to it, so xc=58x_{\text{c}}=58.

  2. Question

    A random procedure generated the following sample (sequence of measurements):

    sort(𝐱)=25,24,27,22,23,21,25,27,26,26\text{sort}(\mathbf{x}) = 25, 24, 27, 22, 23, 21, 25, 27, 26, 26

    You can download the data as a CSV file (for importing into a spreadsheet or R): list_counting_between_data_64523.csv


    1. How many of the measurements are inside 20.5 to 26.5? In other words, determine #[𝐱>20.5 𝙰𝙽𝙳 𝐱<26.5]\#[\mathbf{x}>20.5\texttt{ AND } \mathbf{x}<26.5]. In other words, determine #[20.5<𝐱<26.5]\#[20.5<\mathbf{x}<26.5].
    2. How many of the measurements are outside 23.5 to 28.5? In other words, determine #[𝐱<23.5 𝙾𝚁 𝐱>28.5]\#[\mathbf{x}<23.5\texttt{ OR } \mathbf{x}>28.5]. In other words, determine #[𝙽𝙾𝚃(23.5<𝐱<28.5)]\#[\texttt{NOT}(23.5<\mathbf{x}<28.5)].
    3. How many of the measurements are closer than 2 units from 22.5? In other words, determine #[|𝐱22.5|<2]\#\left[\big|\mathbf{x}-22.5\big|<2\right].
    4. How many of the measurements are farther than 1.5 units from 28? In other words, determine #[|𝐱28|>1.5]\#\left[\big|\mathbf{x}-28\big|>1.5\right].
    5. Determine a half-integer radius rr of an interval with center 23 such that #[|𝐱23|<r]=1\#\left[\big|\mathbf{x}-23\big|<r\right] ~=~ 1.
    6. Determine an integer radius rr of an interval with center 26.5 such that #[|𝐱26.5|>r]=4\#\left[\big|\mathbf{x}-26.5\big|>r\right] ~=~ 4.

    Solution

    First, it helps to sort the data. sort(𝐱)=21,22,23,24,25,25,26,26,27,27\text{sort}(\mathbf{x}) = 21, 22, 23, 24, 25, 25, 26, 26, 27, 27

    You could use R to sort the data:

    sort(c(25,24,27,22,23,21,25,27,26,26))
    ##  [1] 21 22 23 24 25 25 26 26 27 27

    You could also use a spreadsheet. Import the table into a spreadsheet. Then, highlight the column of measurements and use Sort function.


    1. There are 8 measurements between 20.5 and 26.5.
    2. There are 3 measurements outside 23.5 and 28.5.
    3. There are 4 measurements closer than 2 units from 22.5. It might help to rephrase the problem as finding how many measurements are between 20.5 and 24.5. These boundaries are found by subtracting 2 from 22.5 and adding 2 to 22.5.
    4. There are 2 measurements farther than 1.5 units from 28. It might help to rephrase the problem as finding how many measurements are outside 26.5 and 29.5. These boundaries are found by subtracting 1.5 from 28 and adding 1.5 to 28.
    5. You can guess and check until you find a half-integer that satisfies the equation. It turns out that r=0.5r = 0.5 satisfies the equation, because #[|𝐱23|<0.5]=1\#\left[\big|\mathbf{x}-23\big|<0.5\right] ~=~ 1.
    6. You can guess and check until you find an integer that satisfies the equation. It turns out that r=2r = 2 satisfies the equation, because #[|𝐱26.5|>2]=4\#\left[\big|\mathbf{x}-26.5\big|>2\right] ~=~ 4.

  3. Question

    A random procedure generated many measurements: list_counting_large_data_80983.csv

    51.94, 51.97, 52.04, 50.64, 57.32, 58.67, 50.92, 58.9, 56.81, 56.44, 51.99, 52.43, 51.49, 53.19, 51.15, 59.5, 55.58, 51.9, 52.56, 54.75, 51.33, 56.55, 53.7, 53.22, 52.44, 52.52, 51.28, 56.02, 53.15, 54.86, 58.18, 50.58, 55.11, 53.01, 51.81, 56.05, 56.19, 52.59, 55.53, 55.89, 53.59, 50.79, 52.17, 52.52, 59.18, 54.09, 50.71, 58.73, 51.7, 50.63, 59.5, 55.11, 51.24, 55.48, 59.13, 53.15, 52.19, 55.39, 50.13, 57.45, 56.39, 57.42, 57.53, 59.61, 57.6, 57.89, 53.85, 55.16, 59.97, 54.53, 58.73, 58.25, 59.12, 59.36, 52.54, 52.11, 55.36, 54.33, 50.92, 52.52, 51.62, 51.52, 54.56, 50.76, 52.99, 53.52, 50.76, 59.99, 53.29, 59.25, 50.8, 53.69, 50.3, 55, 56.88, 51.45, 55.34, 57.45, 59.17, 58.17, 56.76, 55.17, 53.96, 52.15, 56.48, 53.95, 52.74, 50.83, 50.48, 57.19, 51.06, 55.58, 55.15, 50.17, 57.17, 56.01, 54.12, 57.79, 53.23, 52.47, 52.39, 57.12, 52.95, 55.31, 54.79, 57.55, 56.39, 51.55, 59.47, 50.82, 58.19, 59.4, 51.92, 59.18, 54.99, 59.94, 55.5, 58.72, 57.97, 52.26, 58.82, 56.7, 50.59, 50.78, 57.86, 55.18, 56.93, 56.84, 52.4, 56.51, 54.46, 50.79, 53.84, 54.69, 57.5, 50.33, 55.4, 54.9, 55.65, 53.88, 55.3, 54.11, 57.63, 54.19, 57.74, 50.04, 57.13, 50.27, 51.63, 54.32, 56.36, 57.35, 55.11, 58.8, 58.19

    1. How many measurements are in the sample? In other words, determine #[𝐱]\#[\mathbf{x}]. In the future, we will use nn to denote the number of measurements in the sample.
    2. How many of the measurements are less than 57? In other words, determine #[𝐱<57]\#[\mathbf{x}<57].
    3. How many of the measurements are greater than 55.5? In other words, determine #[𝐱>55.5]\#[\mathbf{x}>55.5].
    4. How many of the measurements are closer than 3 units from 57? In other words, determine #[|𝐱57|<3]\#\left[\big|\mathbf{x}-57\big|<3\right].
    5. How many of the measurements are farther than 1.5 units from 56.2? In other words, determine #[|𝐱56.2|>1.5]\#\left[\big|\mathbf{x}-56.2\big|>1.5\right].

    Solution

    You will want to use a computer to answer these questions.


    If you used a spreadsheet, you should end up with this solution csv.

    To use R, the following commands would answer the questions.

    x = c(51.94,51.97,52.04,50.64,57.32,58.67,50.92,58.9,56.81,56.44,51.99,52.43,51.49,53.19,51.15,59.5,55.58,51.9,52.56,54.75,51.33,56.55,53.7,53.22,52.44,52.52,51.28,56.02,53.15,54.86,58.18,50.58,55.11,53.01,51.81,56.05,56.19,52.59,55.53,55.89,53.59,50.79,52.17,52.52,59.18,54.09,50.71,58.73,51.7,50.63,59.5,55.11,51.24,55.48,59.13,53.15,52.19,55.39,50.13,57.45,56.39,57.42,57.53,59.61,57.6,57.89,53.85,55.16,59.97,54.53,58.73,58.25,59.12,59.36,52.54,52.11,55.36,54.33,50.92,52.52,51.62,51.52,54.56,50.76,52.99,53.52,50.76,59.99,53.29,59.25,50.8,53.69,50.3,55,56.88,51.45,55.34,57.45,59.17,58.17,56.76,55.17,53.96,52.15,56.48,53.95,52.74,50.83,50.48,57.19,51.06,55.58,55.15,50.17,57.17,56.01,54.12,57.79,53.23,52.47,52.39,57.12,52.95,55.31,54.79,57.55,56.39,51.55,59.47,50.82,58.19,59.4,51.92,59.18,54.99,59.94,55.5,58.72,57.97,52.26,58.82,56.7,50.59,50.78,57.86,55.18,56.93,56.84,52.4,56.51,54.46,50.79,53.84,54.69,57.5,50.33,55.4,54.9,55.65,53.88,55.3,54.11,57.63,54.19,57.74,50.04,57.13,50.27,51.63,54.32,56.36,57.35,55.11,58.8,58.19)
    length(x)
    ## [1] 175
    sum(x<57)
    ## [1] 129
    sum(x>55.5)
    ## [1] 68
    sum(abs(x-57)<3)
    ## [1] 99
    sum(abs(x-56.2)>1.5)
    ## [1] 118

    1. There are 175 measurements.
    2. There are 129 measurements less than 57.
    3. There are 68 measurements greater than 55.5.
    4. There are 99 measurements closer than 3 units from 57. You could also say there are 99 measurements inside 54 to 60.
    5. There are 118 measurements farther than 1.5 units from 56.2. You could also say there are 118 measurements outside 54.7 to 57.7.

  4. Question

    A random procedure generated many measurements: download data

    Please complete the frequency distribution using breaks 70, 75, 80, 85, 90:

    Interval Frequency
    70 to 75
    75 to 80
    80 to 85
    85 to 90


    Solution

    You will want to use a computer to answer these questions.

    In a spreadsheet, open the data, add the breaks as a column; then, use the FREQUENCY function.

    In R, open the data and use the hist function. You supply the breaks and read the counts:

    mydata = read.csv("make_freq_dist.csv")
    x = mydata$x
    myhist = hist(x,breaks=c(70,75,80,85,90))
    myhist$counts
    ## [1] 48 12  5  5
    interval frequency
    70 to 75 48
    75 to 80 12
    80 to 85 5
    85 to 90 5


  5. Question

    A random procedure generated 75 measurements, which were organized into the frequency distribution shown below. You can assume the measurements are of a continuous random variable, such that every measurement is in one of the intervals (and not on a break).

    interval frequency
    55 to 60 7
    60 to 65 13
    65 to 70 21
    70 to 75 12
    75 to 80 12
    80 to 85 10

    1. Evaluate #[𝐱<80]\#[\mathbf{x}<80].
    2. Evaluate #[𝐱>60]\#[\mathbf{x}>60].
    3. Evaluate #[|𝐱72.5|<7.5]\#[|\mathbf{x}-72.5|<7.5].
    4. Evaluate #[|𝐱67.5|>7.5]\#[|\mathbf{x}-67.5|>7.5].
    5. Evaluate boundary bb such that #[𝐱<b]=65\#[\mathbf{x}<b] = 65.
    6. Evaluate boundary bb such that #[𝐱>b]=22\#[\mathbf{x}>b] = 22.
    7. Evaluate radius rr such that #[|𝐱70|<r]=58\#[|\mathbf{x}-70|<r] = 58.
    8. Evaluate radius rr such that #[|𝐱67.5|>r]=29\#[|\mathbf{x}-67.5|>r] = 29.

    Solution

    The first 4 questions involve adding up frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. There are 65 measurements less than 80. So, #[𝐱<80]=65\#[\mathbf{x}<80]=65.
    2. There are 68 measurements more than 60. So, #[𝐱>60]=68\#[\mathbf{x}>60]=68.
    3. There are 45 measurements closer than 7.5 units from 72.5. So, #[|𝐱72.5|<7.5]=45\#[|\mathbf{x}-72.5|<7.5]=45.
    4. There are 29 measurements further than 7.5 units from 67.5. So, #[|𝐱67.5|>7.5]=29\#[|\mathbf{x}-67.5|>7.5]=29.
    5. A boundary at b=80b=80 has 65 measurements less than it. So, #[𝐱<80]=65\#[\mathbf{x}<80]=65.
    6. A boundary at b=75b=75 has 22 measurements more than it. So, #[𝐱>75]=22\#[\mathbf{x}>75]=22.
    7. An interval with radius r=10r=10 around center 70 contains 58 measurements. So, #[|𝐱70|<10]=58\#[|\mathbf{x}-70|<10]=58.
    8. An interval with radius r=7.5r=7.5 around center 67.5 excludes 29 measurements. So, #[|𝐱67.5|<7.5]=29\#[|\mathbf{x}-67.5|<7.5]=29.

  6. Question

    A random procedure generated 70 measurements, which were organized into the histogram shown below. You can assume the measurements are of a continuous variable, such that every measurement is in one of the intervals (and not on a break).

    plot of chunk unnamed-chunk-1


    1. Evaluate #[𝐱<50]\#[\mathbf{x}<50].
    2. Evaluate #[𝐱>50]\#[\mathbf{x}>50].
    3. Evaluate #[|𝐱45|<5]\#[|\mathbf{x}-45|<5].
    4. Evaluate #[|𝐱50|>5]\#[|\mathbf{x}-50|>5].
    5. Evaluate boundary bb such that #[𝐱<b]=58\#[\mathbf{x}<b] = 58.
    6. Evaluate boundary bb such that #[𝐱>b]=12\#[\mathbf{x}>b] = 12.
    7. Evaluate radius rr such that #[|𝐱47.5|<r]=24\#[|\mathbf{x}-47.5|<r] = 24.
    8. Evaluate radius rr such that #[|𝐱52.5|>r]=65\#[|\mathbf{x}-52.5|>r] = 65.

    Solution

    The first 4 questions involve adding up frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. There are 64 measurements less than 50. So, #[𝐱<50]=64\#[\mathbf{x}<50]=64.
    2. There are 6 measurements more than 50. So, #[𝐱>50]=6\#[\mathbf{x}>50]=6.
    3. There are 19 measurements closer than 5 units from 45. So, #[|𝐱45|<5]=19\#[|\mathbf{x}-45|<5]=19.
    4. There are 59 measurements further than 5 units from 50. So, #[|𝐱50|>5]=59\#[|\mathbf{x}-50|>5]=59.
    5. A boundary at b=45b=45 has 58 measurements less than it. So, #[𝐱<45]=58\#[\mathbf{x}<45]=58.
    6. A boundary at b=45b=45 has 12 measurements more than it. So, #[𝐱>45]=12\#[\mathbf{x}>45]=12.
    7. An interval with radius r=7.5r=7.5 around center 47.5 contains 24 measurements. So, #[|𝐱47.5|<7.5]=24\#[|\mathbf{x}-47.5|<7.5]=24.
    8. An interval with radius r=2.5r=2.5 around center 52.5 excludes 65 measurements. So, #[|𝐱52.5|>2.5]=65\#[|\mathbf{x}-52.5|>2.5]=65.

  7. Question

    A standard twelve-sided die was rolled 80 times, and the results were organized into the histogram shown below. In dice notation, we could say the results of 80d12 were plotted as a histogram. (Pedantic sidenote: it is common to interpret 80d12 as the SUM of 80 rolls, but we will interpret 80d12 as the LIST of 80 rolls and use 80d12\sum 80\text{d}12 as the sum of 80 rolls.)

    plot of chunk unnamed-chunk-1


    1. Evaluate #[𝐱<9.5]\#[\mathbf{x}<9.5].
    2. Evaluate #[𝐱>7.5]\#[\mathbf{x}>7.5].
    3. Evaluate #[|𝐱2|<0.5]\#[|\mathbf{x}-2|<0.5].
    4. Evaluate #[|𝐱5|>3.5]\#[|\mathbf{x}-5|>3.5].
    5. Evaluate half-integer boundary bb such that #[𝐱<b]=37\#[\mathbf{x}<b] = 37.
    6. Evaluate half-integer boundary bb such that #[𝐱>b]=31\#[\mathbf{x}>b] = 31.
    7. Evaluate half-integer radius rr such that #[|𝐱3|<r]=6\#[|\mathbf{x}-3|<r] = 6.
    8. Evaluate half-integer radius rr such that #[|𝐱5|>r]=46\#[|\mathbf{x}-5|>r] = 46.

    Solution

    The first 4 questions involve adding up frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. There are 57 measurements less than 9.5. So, #[𝐱<9.5]=57\#[\mathbf{x}<9.5]=57.
    2. There are 31 measurements more than 7.5. So, #[𝐱>7.5]=31\#[\mathbf{x}>7.5]=31.
    3. There are 4 measurements closer than 0.5 units from 2. So, #[|𝐱2|<0.5]=4\#[|\mathbf{x}-2|<0.5]=4.
    4. There are 38 measurements further than 3.5 units from 5. So, #[|𝐱5|>3.5]=38\#[|\mathbf{x}-5|>3.5]=38.
    5. A boundary at b=5.5b=5.5 has 37 measurements less than it. So, #[𝐱<5.5]=37\#[\mathbf{x}<5.5]=37.
    6. A boundary at b=7.5b=7.5 has 31 measurements more than it. So, #[𝐱>7.5]=31\#[\mathbf{x}>7.5]=31.
    7. An interval with radius r=0.5r=0.5 around center 3 contains 6 measurements. So, #[|𝐱3|<0.5]=6\#[|\mathbf{x}-3|<0.5]=6.
    8. An interval with radius r=2.5r=2.5 around center 5 excludes 46 measurements. So, #[|𝐱5|>2.5]=46\#[|\mathbf{x}-5|>2.5]=46.

  8. Question

    A standard eight-sided die was rolled 100 times, and the results were organized into the pie chart shown below.

    plot of chunk unnamed-chunk-1


    1. Evaluate #[𝐱<3.5]\#[\mathbf{x}<3.5].
    2. Evaluate #[𝐱>5.5]\#[\mathbf{x}>5.5].
    3. Evaluate #[|𝐱4|<0.5]\#[|\mathbf{x}-4|<0.5].
    4. Evaluate #[|𝐱3.5|>2]\#[|\mathbf{x}-3.5|>2].
    5. Evaluate half-integer boundary bb such that #[𝐱<b]=56\#[\mathbf{x}<b] = 56.
    6. Evaluate half-integer boundary bb such that #[𝐱>b]=23\#[\mathbf{x}>b] = 23.
    7. Evaluate half-integer radius rr such that #[|𝐱3|<r]=37\#[|\mathbf{x}-3|<r] = 37.
    8. Evaluate integer radius rr such that #[|𝐱6.5|>r]=76\#[|\mathbf{x}-6.5|>r] = 76.

    Solution

    The first 4 questions involve adding up frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. There are 45 measurements less than 3.5. So, #[𝐱<3.5]=45\#[\mathbf{x}<3.5]=45.
    2. There are 34 measurements more than 5.5. So, #[𝐱>5.5]=34\#[\mathbf{x}>5.5]=34.
    3. There are 11 measurements closer than 0.5 units from 4. So, #[|𝐱4|<0.5]=11\#[|\mathbf{x}-4|<0.5]=11.
    4. There are 53 measurements further than 2 units from 3.5. So, #[|𝐱3.5|>2]=53\#[|\mathbf{x}-3.5|>2]=53.
    5. A boundary at b=4.5b=4.5 has 56 measurements less than it. So, #[𝐱<4.5]=56\#[\mathbf{x}<4.5]=56.
    6. A boundary at b=6.5b=6.5 has 23 measurements more than it. So, #[𝐱>6.5]=23\#[\mathbf{x}>6.5]=23.
    7. An interval with radius r=1.5r=1.5 around center 3 contains 37 measurements. So, #[|𝐱3|<1.5]=37\#[|\mathbf{x}-3|<1.5]=37.
    8. An interval with radius r=1r=1 around center 6.5 excludes 76 measurements. So, #[|𝐱6.5|>1]=76\#[|\mathbf{x}-6.5|>1]=76.

  9. Question

    A random procedure generated the following sample (sequence of measurements):

    𝐱=14,19,10,16,17,10,12,10,19,17\mathbf{x} = 14, 19, 10, 16, 17, 10, 12, 10, 19, 17

    You can download the data as a CSV file (for importing into a spreadsheet or R): list_counting_basic_data_41468.csv


    1. What proportion of the measurements are less than 10? In other words, determine prop[𝐱<10]\text{prop}[\mathbf{x}<10].
    2. What proportion of the measurements are less than or equal to 19? In other words, determine prop[𝐱19]\text{prop}[\mathbf{x}\le19].
    3. What proportion of the measurements are greater than 10? In other words, determine prop[𝐱>10]\text{prop}[\mathbf{x}>10].
    4. What proportion of the measurements are greater than or equal to 17? In other words, determine prop[𝐱17]\text{prop}[\mathbf{x}\ge17].
    5. If the proportion of measurements less than xcx_{\text{c}} is 0.6, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that prop[𝐱<xc]=0.6\text{prop}[\mathbf{x}<x_{\text{c}}] = 0.6.
    6. If the proportion of measurements less than or equal to xcx_{\text{c}} is 0.5, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that prop[𝐱xc]=0.5\text{prop}[\mathbf{x}\le x_{\text{c}}] = 0.5.
    7. If the proportion of measurements greater than xcx_{\text{c}} is 0, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that prop[𝐱>xc]=0\text{prop}[\mathbf{x}>x_{\text{c}}] = 0.
    8. If the proportion of measurements greater than or equal to xcx_{\text{c}} is 0.5, and xcx_{\text{c}} is a measurement from sample 𝐱\mathbf{x}, then what is xcx_{\text{c}}? In other words, determine a measurement xcx_{\text{c}} from sequence 𝐱\mathbf{x} such that prop[𝐱xc]=0.5\text{prop}[\mathbf{x}\ge x_{\text{c}}] = 0.5.

    Solution

    First, it helps to sort the data. 10,10,10,12,14,16,17,17,19,1910, 10, 10, 12, 14, 16, 17, 17, 19, 19

    You could use R to sort the data:

    sort(c(14,19,10,16,17,10,12,10,19,17))
    ##  [1] 10 10 10 12 14 16 17 17 19 19

    You could also use a spreadsheet. Import the table into a spreadsheet. Then, highlight the column of measurements and use Sort function.


    1. The proportion of measurements less than 10 is 0. prop[𝐱<10]=0\text{prop}[\mathbf{x}<10]=0
    2. The proportion of measurements less than or equal to 19 is 1. prop[𝐱19]=1\text{prop}[\mathbf{x}\le 19]=1
    3. The proportion of measurements greater than 10 is 0.7. prop[𝐱>10]=0.7\text{prop}[\mathbf{x}> 10]=0.7
    4. The proportion of measurements greater than or equal to 17 is 0.4. prop[𝐱17]=0.4\text{prop}[\mathbf{x}\ge 17]=0.4
    5. The proportion of measurements less than 17 is 0.6. prop[𝐱<17]=0.6\text{prop}[\mathbf{x}<17]=0.6
    6. The proportion of measurements less than or equal to 14 is 0.5. prop[𝐱14]=0.5\text{prop}[\mathbf{x}\le 14]=0.5
    7. The proportion of measurements greater than 19 is 0. prop[𝐱>19]=0\text{prop}[\mathbf{x}> 19]=0
    8. The proportion of measurements greater than or equal to 16 is 0.5. prop[𝐱16]=0.5\text{prop}[\mathbf{x}\ge 16]=0.5

  10. Question

    The lengths (in centimeters) of 10 lizards were recorded.

    𝐱=97,98,94,93,98,96,99,92,95,99\mathbf{x} = 97, 98, 94, 93, 98, 96, 99, 92, 95, 99

    You can download the data as a CSV file (for importing into a spreadsheet or R): lizard_data.csv


    1. What proportion of the measurements are inside 92.5 cm to 98.5 cm? In other words, determine prop[𝐱>92.5 𝙰𝙽𝙳 𝐱<98.5]\text{prop}[\mathbf{x}>92.5\texttt{ AND } \mathbf{x}<98.5]. In other words, determine prop[92.5<𝐱<98.5]\text{prop}[92.5<\mathbf{x}<98.5].
    2. What proportion of the measurements are outside 95.5 cm to 99.5 cm? In other words, determine prop[𝐱<95.5 𝙾𝚁 𝐱>99.5]\text{prop}[\mathbf{x}<95.5\texttt{ OR } \mathbf{x}>99.5]. In other words, determine prop[𝙽𝙾𝚃(95.5<𝐱<99.5)]\text{prop}[\texttt{NOT}(95.5<\mathbf{x}<99.5)].
    3. What proportion of the measurements are closer than 3.5 cm from 95 cm? In other words, determine prop[|𝐱95|<3.5]\text{prop}\left[\big|\mathbf{x}-95\big|<3.5\right].
    4. What proportion of the measurements are farther than 1 cm from 94.5 cm? In other words, determine prop[|𝐱94.5|>1]\text{prop}\left[\big|\mathbf{x}-94.5\big|>1\right].
    5. Determine a half-integer radius rr of an interval with center 97 cm such that prop[|𝐱97|<r]=0.4\text{prop}\left[\big|\mathbf{x}-97\big|<r\right] ~=~ 0.4.
    6. Determine an integer radius rr of an interval with center 97.5 cm such that prop[|𝐱97.5|>r]=0.7\text{prop}\left[\big|\mathbf{x}-97.5\big|>r\right] ~=~ 0.7.

    Solution

    First, it helps to sort the data. sort(𝐱)=92,93,94,95,96,97,98,98,99,99\text{sort}(\mathbf{x}) = 92, 93, 94, 95, 96, 97, 98, 98, 99, 99

    You could use R to sort the data:

    sort(c(97,98,94,93,98,96,99,92,95,99))
    ##  [1] 92 93 94 95 96 97 98 98 99 99

    You could also use a spreadsheet. Import the table into a spreadsheet. Then, highlight the column of measurements and use Sort function.


    1. The proportion of measurements between 92.5 and 98.5 is 0.7. prop[92.5<𝐱<98.5]=0.7\text{prop}[92.5<\mathbf{x}<98.5]=0.7
    2. The proportion of measurements outside 95.5 and 99.5 is 0.4. prop[𝙽𝙾𝚃(95.5<𝐱<99.5)]=0.4\text{prop}[\texttt{NOT}(95.5<\mathbf{x}<99.5)]=0.4
    3. The proportion of measurements closer than 3.5 cm from 95 cm is 0.8. prop[|𝐱95|<3.5]=0.8\text{prop}\left[\big|\mathbf{x}-95\big|<3.5\right] = 0.8
    4. The proportion of measurements farther than 1 cm from 94.5 cm is 0.8. prop[|𝐱94.5|>1]=0.8\text{prop}\left[\big|\mathbf{x}-94.5\big|>1\right] = 0.8
    5. You can guess and check until you find a half-integer that satisfies the equation. It turns out that r=1.5r = 1.5 satisfies the equation, because prop[|𝐱97|<1.5]=0.4\text{prop}\left[\big|\mathbf{x}-97\big|<1.5\right] ~=~ 0.4
    6. You can guess and check until you find an integer that satisfies the equation. It turns out that r=1r = 1 satisfies the equation, because prop[|𝐱97.5|>1]=0.7\text{prop}\left[\big|\mathbf{x}-97.5\big|>1\right] ~=~ 0.7

  11. Question

    Jordan is practicing free throws. They has recorded the results of many free throws.

    ## Hit Hit Hit Miss Hit Hit Hit Hit Hit Hit Hit Miss Miss Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Miss Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Miss Hit Miss Hit Hit Hit Hit Miss Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Miss Hit Hit Hit Hit Hit Hit Hit Miss Hit Hit Hit Hit Hit Hit Hit Miss Hit Hit Hit Miss Hit Hit Hit Hit Hit Miss Hit Miss Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Miss Hit Hit Hit

    You can download the data as a csv: basketball_proportion.csv. The header and first four rows are shown below.

    i hit_or_miss
    1 Hit
    2 Hit
    3 Hit
    4 Miss
    \vdots \vdots

    1. How many shots did Jordan attempt? (Evaluate nn.)
    2. How many shots were successful? (Evaluate nsn_\text{s}.)
    3. How many shots were failures? (Evaluate nfn_\text{f}.)
    4. What proportion of Jordan’s shots were successful? (Evaluate p̂\hat{p}.)
    5. What proportion of Jordan’s shots were failures? (Evaluate q̂\hat{q}.)

    Solution

    I recommend using either R or a spreadsheet.

    Using R:

    First, download the csv. Also, write the following script as save it as basketball_proportion.r. Put both files in the same directory (folder). Run the script.

    mydata = read.csv("basketball_proportion.csv")
    x = mydata$hit_or_miss
    n = length(x)
    ns = sum(x=="Hit")
    nf = sum(x=="Miss")
    phat = ns/n
    qhat = nf/n
    cat(sprintf("n=%d, ns=%d, nf=%d, phat=%.4f, qhat=%.4f",n,ns,nf,phat,qhat))
    ## n=106, ns=92, nf=14, phat=0.8679, qhat=0.1321

    If you are using Rstudio, you may need to click Session, Set Working Directory, Source File Location while the script (basketball_proportion.r) is the open tab.

    Using a spreadsheet

    First, if you scroll down, it should be clear there are 106 rows of data, because the last row has i=106i=106. In column C use IF(B2="Hit",1,0) and in column D use IF(B2="Miss",1,0), and extend the formulas down, to get columns of 0s and 1s, then use SUM(C2:C106) and SUM(D2:D106) to get nsn_s and nfn_f. You can divide these by nn to determine p̂\hat{p} and q̂\hat{q}.

    You can see a solution CSV: proportion_solution.csv. Remember, you can hit ctrl+~ to see the formulas. You may need to enlarge a cell if it shows ###.



  12. Question

    A random procedure generated n=130n=130 measurements: download data

    I’ve already determined the frequencies. Please determine the relative frequencies and the densities. A brief description of relative frequency and density can be found here.

    Interval Frequency Relative Frequency Density
    30 to 32 4
    32 to 34 5
    34 to 36 9
    36 to 38 43
    38 to 40 69


    Solution

    To determine the relative frequencies, just divide each frequency by 130 (because n=130n=130). To determine the densities, divide the relative frequencies by the width of the interval, which in this case is the same for each interval (3230=232-30=2).

    Interval Frequency Relative Frequency Density
    30 to 32 4 0.03077 0.01538
    32 to 34 5 0.03846 0.01923
    34 to 36 9 0.06923 0.03462
    36 to 38 43 0.3308 0.1654
    38 to 40 69 0.5308 0.2654


  13. Question

    A random procedure generated 120 measurements, which were organized into the frequency distribution shown below. You can assume the measurements are of a continuous random variable, such that every measurement is in one of the intervals (and not on a break).

    interval frequency relative frequency density
    60 to 65 51 0.425 0.085
    65 to 70 10 0.08333 0.01667
    70 to 75 5 0.04167 0.008333
    75 to 80 3 0.025 0.005
    80 to 85 13 0.1083 0.02167
    85 to 90 38 0.3167 0.06333

    1. Evaluate prop[𝐱<70]\text{prop}[\mathbf{x}<70].
    2. Evaluate prop[𝐱>65]\text{prop}[\mathbf{x}>65].
    3. Evaluate prop[|𝐱75|<5]\text{prop}[|\mathbf{x}-75|<5].
    4. Evaluate prop[|𝐱80|>5]\text{prop}[|\mathbf{x}-80|>5].
    5. Evaluate boundary bb such that prop[𝐱<b]=0.6833\text{prop}[\mathbf{x}<b] = 0.6833.
    6. Evaluate boundary bb such that prop[𝐱>b]=0.575\text{prop}[\mathbf{x}>b] = 0.575.
    7. Evaluate radius rr such that prop[|𝐱72.5|<r]=0.15\text{prop}[|\mathbf{x}-72.5|<r] = 0.15.
    8. Evaluate radius rr such that prop[|𝐱67.5|>r]=0.9167\text{prop}[|\mathbf{x}-67.5|>r] = 0.9167.

    Solution

    The first 4 questions involve adding up relative frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. There are 61 measurements less than 70. So, prop[𝐱<70]=61120=0.5083\text{prop}[\mathbf{x}<70]=\frac{61}{120}=0.5083.
    2. There are 69 measurements more than 65. So, prop[𝐱>65]=69120=0.575\text{prop}[\mathbf{x}>65]=\frac{69}{120}=0.575.
    3. There are 8 measurements closer than 5 units from 75. So, prop[|𝐱75|<5]=0.06667\text{prop}[|\mathbf{x}-75|<5]=0.06667.
    4. There are 104 measurements further than 5 units from 80. So, prop[|𝐱80|>5]=0.8667\text{prop}[|\mathbf{x}-80|>5]=0.8667.
    5. First, 0.6833120=820.6833 \cdot 120 = 82. A boundary at b=85b=85 has 82 measurements less than it. So, prop[𝐱<85]=0.6833\text{prop}[\mathbf{x}<85]=0.6833.
    6. First, 0.575120=690.575 \cdot 120 = 69. A boundary at b=65b=65 has 69 measurements more than it. So, prop[𝐱>65]=0.575\text{prop}[\mathbf{x}>65]=0.575.
    7. First, 0.15120=180.15 \cdot 120 = 18. An interval with radius r=7.5r=7.5 around center 72.5 contains 18 measurements. So, prop[|𝐱72.5|<7.5]=0.15\text{prop}[|\mathbf{x}-72.5|<7.5]=0.15.
    8. First, 0.9167120=1100.9167 \cdot 120 = 110. An interval with radius r=2.5r=2.5 around center 67.5 excludes 110 measurements. So, prop[|𝐱67.5|<2.5]=0.9167\text{prop}[|\mathbf{x}-67.5|<2.5]=0.9167.

  14. Question

    A random procedure generated 50 measurements, which were organized into the histogram shown below. You can assume the measurements are of a continuous variable, such that every measurement is in one of the intervals (and not on a break).

    plot of chunk unnamed-chunk-1


    1. Evaluate prop[𝐱<68]\text{prop}[\mathbf{x}<68].
    2. Evaluate prop[𝐱>67]\text{prop}[\mathbf{x}>67].
    3. Evaluate prop[|𝐱67.75|<0.75]\text{prop}[|\mathbf{x}-67.75|<0.75].
    4. Evaluate prop[|𝐱68.25|>1.25]\text{prop}[|\mathbf{x}-68.25|>1.25].
    5. Evaluate boundary bb such that prop[𝐱<b]=0.06\text{prop}[\mathbf{x}<b] = 0.06.
    6. Evaluate boundary bb such that prop[𝐱>b]=0.74\text{prop}[\mathbf{x}>b] = 0.74.
    7. Evaluate radius rr such that prop[|𝐱67.25|<r]=0.08\text{prop}[|\mathbf{x}-67.25|<r] = 0.08.
    8. Evaluate radius rr such that prop[|𝐱67|>r]=0.88\text{prop}[|\mathbf{x}-67|>r] = 0.88.

    Solution

    You may find it helpful to convert the densities to frequencies by multiplying each density by both the total sample size (n=50n=50) and the width of the bar (0.5).

    plot of chunk unnamed-chunk-2

    The first 4 questions involve adding up frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. There are 13 measurements less than 68. So, prop[𝐱<68]=0.26\text{prop}[\mathbf{x}<68]=0.26.
    2. There are 47 measurements more than 67. So, prop[𝐱>67]=0.94\text{prop}[\mathbf{x}>67]=0.94.
    3. There are 15 measurements closer than 0.75 units from 67.75. So, prop[|𝐱67.75|<0.75]=0.3\text{prop}[|\mathbf{x}-67.75|<0.75]=0.3.
    4. There are 24 measurements further than 1.25 units from 68.25. So, prop[|𝐱68.25|>1.25]=0.48\text{prop}[|\mathbf{x}-68.25|>1.25]=0.48.
    5. A boundary at b=67b=67 has 3 measurements less than it. So, prop[𝐱<67]=0.06\text{prop}[\mathbf{x}<67]=0.06.
    6. A boundary at b=68b=68 has 37 measurements more than it. So, prop[𝐱>68]=0.74\text{prop}[\mathbf{x}>68]=0.74.
    7. An interval with radius r=0.25r=0.25 around center 67.25 contains 4 measurements. So, prop[|𝐱67.25|<0.25]=0.08\text{prop}[|\mathbf{x}-67.25|<0.25]=0.08.
    8. An interval with radius r=0.5r=0.5 around center 67 excludes 44 measurements. So, prop[|𝐱67|>0.5]=0.88\text{prop}[|\mathbf{x}-67|>0.5]=0.88.

  15. Question

    A standard eight-sided die was rolled 200 times, and the results were organized into the histogram shown below. In dice notation, we could say the results of 200d8 were plotted as a histogram. (Pedantic sidenote: it is common to interpret 200d8 as the SUM of 200 rolls, but we will interpret 200d8 as the LIST of 200 rolls and use 200d8\sum 200\text{d}8 as the sum of 200 rolls.)

    plot of chunk unnamed-chunk-1


    1. Evaluate prop[𝐱<4.5]\text{prop}[\mathbf{x}<4.5].
    2. Evaluate prop[𝐱>3.5]\text{prop}[\mathbf{x}>3.5].
    3. Evaluate prop[|𝐱7|<0.5]\text{prop}[|\mathbf{x}-7|<0.5].
    4. Evaluate prop[|𝐱2|>0.5]\text{prop}[|\mathbf{x}-2|>0.5].
    5. Evaluate half-integer boundary bb such that prop[𝐱<b]=0.29\text{prop}[\mathbf{x}<b] = 0.29.
    6. Evaluate half-integer boundary bb such that prop[𝐱>b]=0.825\text{prop}[\mathbf{x}>b] = 0.825.
    7. Evaluate half-integer radius rr such that prop[|𝐱6|<r]=0.41\text{prop}[|\mathbf{x}-6|<r] = 0.41.
    8. Evaluate integer radius rr such that prop[|𝐱4.5|>r]=0.49\text{prop}[|\mathbf{x}-4.5|>r] = 0.49.

    Solution

    The first 4 questions involve adding up relative frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.

    You may also find it is helpful to determine frequencies (counts). You do this by multiplying each relative frequency by the total number of measurements.

    plot of chunk unnamed-chunk-2


    1. There are 82 measurements less than 4.5. So, prop[𝐱<4.5]=82200=0.41\text{prop}[\mathbf{x}<4.5]=\frac{82}{200}=0.41.
    2. There are 142 measurements more than 3.5. So, prop[𝐱>3.5]=142200=0.71\text{prop}[\mathbf{x}>3.5]=\frac{142}{200}=0.71.
    3. There are 27 measurements closer than 0.5 units from 7. So, prop[|𝐱7|<0.5]=27200=0.135\text{prop}[|\mathbf{x}-7|<0.5]=\frac{27}{200}=0.135.
    4. There are 183 measurements further than 0.5 units from 2. So, prop[|𝐱2|>0.5]=183200=0.915\text{prop}[|\mathbf{x}-2|>0.5]=\frac{183}{200}=0.915.
    5. First, convert the proportion into a count. 0.29200=580.29\cdot200=58. A boundary at b=3.5b=3.5 has 58 measurements less than it. So, prop[𝐱<3.5]=0.29\text{prop}[\mathbf{x}<3.5]=0.29.
    6. First, convert the proportion into a count. 0.825200=1650.825\cdot200=165. A boundary at b=2.5b=2.5 has 165 measurements more than it. So, prop[𝐱>2.5]=0.825\text{prop}[\mathbf{x}>2.5]=0.825.
    7. An interval with radius r=1.5r=1.5 around center 6 contains 82 measurements. So, prop[|𝐱6|<1.5]=0.41\text{prop}[|\mathbf{x}-6|<1.5]=0.41.
    8. An interval with radius r=2r=2 around center 4.5 excludes 98 measurements. So, prop[|𝐱4.5|>2]=0.49\text{prop}[|\mathbf{x}-4.5|>2]=0.49.

  16. Question

    A standard six-sided die was rolled 100 times, and the results were organized into the pie chart shown below. The outside of the circle marks the cumulative proportion.

    plot of chunk unnamed-chunk-1


    1. Evaluate prop[𝐱<5.5]\text{prop}[\mathbf{x}<5.5].
    2. Evaluate prop[𝐱>3.5]\text{prop}[\mathbf{x}>3.5].
    3. Evaluate prop[|𝐱2.5|<1]\text{prop}[|\mathbf{x}-2.5|<1].
    4. Evaluate prop[|𝐱2.5|>1]\text{prop}[|\mathbf{x}-2.5|>1].
    5. Evaluate half-integer boundary bb such that prop[𝐱<b]=0.4\text{prop}[\mathbf{x}<b] = 0.4.
    6. Evaluate half-integer boundary bb such that prop[𝐱>b]=0.6\text{prop}[\mathbf{x}>b] = 0.6.
    7. Evaluate integer radius rr such that prop[|𝐱4.5|<r]=0.36\text{prop}[|\mathbf{x}-4.5|<r] = 0.36.
    8. Evaluate integer radius rr such that prop[|𝐱3.5|>r]=0.7\text{prop}[|\mathbf{x}-3.5|>r] = 0.7.

    Solution

    The first 4 questions involve adding up relative frequencies of the indicated intervals. The last 4 questions can be done by guessing and checking until something works.


    1. The proportion of measurements less than 5.5 is 0.88. prop[𝐱<5.5]=0.88\text{prop}[\mathbf{x}<5.5]=0.88
    2. The proportion of measurements more than 3.5 is 0.48. prop[𝐱>3.5]=0.48\text{prop}[\mathbf{x}>3.5]=0.48
    3. The proportion of measurements closer than 1 units from 2.5 is 0.37. prop[|𝐱2.5|<1]=0.37\text{prop}[|\mathbf{x}-2.5|<1]=0.37
    4. The proportion of measurements farther than 1 units from 2.5 is 0.63. prop[|𝐱2.5|>1]=0.63\text{prop}[|\mathbf{x}-2.5|>1]=0.63
    5. The proportion of measurements less than 2.5 is 0.4. prop[𝐱<2.5]=0.4\text{prop}[\mathbf{x}<2.5]=0.4
    6. The proportion of measurements more than 2.5 is 0.6. prop[𝐱>2.5]=0.6\text{prop}[\mathbf{x}>2.5]=0.6
    7. The proportion of measurements closer than 1 units from 4.5 is 0.36. prop[|𝐱4.5|<1]=0.36\text{prop}[|\mathbf{x}-4.5|<1]=0.36
    8. The proportion of measurements farther than 1 units from 3.5 is 0.7. prop[|𝐱3.5|>1]=0.7\text{prop}[|\mathbf{x}-3.5|>1]=0.7

  17. Question

    Match the five histograms with their appropriate description.

    plot of chunk unnamed-chunk-1


    1. Uniform (Enter an integer between 1 and 5)
    2. Bell (Enter an integer between 1 and 5)
    3. Bimodal (Enter an integer between 1 and 5)
    4. Right-skew (Enter an integer between 1 and 5)
    5. Left-skew (Enter an integer between 1 and 5)

    Solution

    This is definitional.


    1. 2
    2. 3
    3. 5
    4. 4
    5. 1

  18. Question

    A sample of size n=200n=200 was taken from an unknown population.

     4.08, 1.04, 1.72, 2.40, 9.14, 9.41, 1.14, 0.01, 9.07, 0.45,
     7.26, 0.30, 9.74, 0.37, 4.29, 1.08, 1.57, 6.62, 7.01, 7.00,
     1.22, 3.20, 0.27, 8.65, 6.00, 3.37, 9.94, 9.16, 9.98, 9.45,
     7.44, 1.42, 7.74, 1.65, 5.59, 0.80, 2.92, 7.38, 1.03, 9.85,
     5.09, 0.00, 3.83, 7.92, 0.80, 9.90, 9.38, 0.91, 0.26, 9.19,
     5.98, 7.95, 7.94, 9.69, 0.19, 4.04, 2.78, 9.51, 8.92, 1.74,
     9.50, 4.46, 0.54, 4.83, 0.71, 9.91, 2.69, 2.73, 5.21, 0.19,
     9.80, 3.59, 0.38, 7.36, 3.52, 5.03, 3.58, 2.74, 9.99, 4.77,
     3.30, 0.31, 2.16, 9.32, 3.06, 7.30, 1.58, 6.27, 3.40, 9.20,
     8.85, 5.92, 0.07, 2.94, 1.15, 8.53, 0.86, 2.40, 0.30, 7.94,
     0.02, 6.92, 0.39, 0.00, 8.74, 9.99, 0.54, 2.76, 0.75, 5.10,
     2.66, 0.00, 8.68, 9.95, 0.00, 2.08, 6.74,10.00, 3.42, 5.92,
     4.67, 0.65, 9.24, 9.18, 1.95, 5.18, 9.23, 8.16, 9.97, 4.84,
     4.24, 0.00, 9.98, 6.62, 5.97, 4.89, 8.76, 1.10, 9.25, 1.34,
     0.97, 0.92, 0.33, 5.43, 7.03, 3.93, 4.58, 7.51, 9.81, 9.28,
     9.52, 7.48, 2.42, 9.85, 3.40, 2.44, 3.18, 0.59, 6.57, 0.80,
     3.54, 3.07, 9.78, 4.58, 4.21, 9.65, 1.72, 6.36,10.00, 9.41,
     7.72, 8.80, 1.44, 7.51, 0.96, 5.66, 9.92, 7.80, 4.88, 1.48,
     7.26, 6.45, 7.27, 7.63, 0.30, 0.38, 7.11, 7.60, 6.08, 4.99,
     1.22, 9.18, 0.37, 7.43, 9.92, 8.78, 9.92, 1.08, 9.97,10.00

    You can download the data as a CSV. Determine which histogram visualizes the data, and describe the shape of the data.

    plot of chunk unnamed-chunk-2



    Solution

    You should make a histogram. This is easy in R.

    x = read.csv("make_hist.csv")$x
    hist(x)

    plot of chunk unnamed-chunk-3

    Using a spreadsheet is way more work.



  19. Question

    A sample was gathered.

    𝐱=42.36,30.23,30.4,31.81,36.92,32.16,36.63,30.57,31.58,32.83,37.21,40.49,30.1\mathbf{x} = 42.36, 30.23, 30.4, 31.81, 36.92, 32.16, 36.63, 30.57, 31.58, 32.83, 37.21, 40.49, 30.1

    You can download the data as a CSV file.

    Determine x\bar{x}, the sample mean. Your answer can be rounded to the nearest tenth.


    Solution

    You need to sum the values (𝐱\sum \mathbf{x}, see summation) and divide by the sample size (nn).

    x=𝐱n=443.2913=34.0992\bar{x} = \frac{\sum \mathbf{x}}{n} = \frac{443.29}{13} = 34.0992 You can round x\bar{x} to the nearest tenth: 34.1.

    plot of chunk unnamed-chunk-2

    In a spreadsheet, you can use the AVERAGE function. You can see a solution spreadsheet here.

    In R, you can use the MEAN function.

    # First, get the csv into the working directory... then...
    data = read.csv("get_mean.csv")
    x = data$x
    xbar = mean(x)
    round(xbar,1)
    ## [1] 34.1

  20. Question

    A sample was gathered.

    𝐱=30.07,33.17,33.51,48.59,30.05,34.38,32.6,40.89,43.32,33.18,33.74,31.1,37.99\mathbf{x} = 30.07, 33.17, 33.51, 48.59, 30.05, 34.38, 32.6, 40.89, 43.32, 33.18, 33.74, 31.1, 37.99

    You can download the data as a CSV file.

    Determine the sample median. Please enter an exact answer.


    Solution

    To determine the median by hand, you first sort the sample. sort(𝐱)=30.05,30.07,31.1,32.6,33.17,33.18,33.51,33.74,34.38,37.99,40.89,43.32,48.59\text{sort}(\mathbf{x})=30.05, 30.07, 31.1, 32.6, 33.17, 33.18, 33.51, 33.74, 34.38, 37.99, 40.89, 43.32, 48.59 If the sample size is odd, just find the number in the middle number (the n+12\frac{n+1}{2}th value). If the sample size is even, determine the mean of the middle two numbers (the n2\frac{n}{2}th and (n2+1)\left(\frac{n}{2}+1\right)th values).

    plot of chunk unnamed-chunk-2

    In a spreadsheet, you can use the MEDIAN function. You can see a solution spreadsheet here.

    In R, you can use the MEDIAN function.

    # First, get the csv into the working directory... then...
    data = read.csv("get_median.csv")
    x = data$x
    median(x)
    ## [1] 33.51

  21. Question

    Depending on the type of distribution, we can make a strong claim regarding the mean and median.

    A sample was gathered and visualized with a histogram.

    plot of chunk unnamed-chunk-1

    What claim can you make regarding the mean and median?


    1. meanmedian\text{mean} \approx \text{median}
    2. mean>median\text{mean} > \text{median}
    3. mean<median\text{mean} < \text{median}

    Solution

    The distribution is skew-left, so mean<median\text{mean} < \text{median}.


    1. FALSE
    2. FALSE
    3. TRUE

  22. Question

    A sample was gathered (from a Bernoulli random variable).

    𝐱=0,0,0,0,1,1,0,1,0,1,0,0\mathbf{x} = 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0

    You can download the data as a CSV file.

    Determine x\bar{x}, the sample mean. Actually, in this special case of 0s and 1s, the sample mean is called the sample proportion (p̂\hat{p}=“p hat”). when data is 0s and 1s... \text{when data is 0s and 1s... } x=p̂\bar{x}=\hat{p} So, determine the sample proportion. Your answer can be rounded to the nearest hundredth.


    Solution

    You need to sum the values (𝐱\sum \mathbf{x}, see summation) and divide by the sample size (nn).

    x=𝐱n=412=0.3333\bar{x} = \frac{\sum \mathbf{x}}{n} = \frac{4}{12} = 0.3333

    In the context of 0s and 1s, it is more appropriate to use p̂\hat{p}.

    p̂=#[1s]#[0s or 1s]=# of successes# of attempts\hat{p} = \frac{\#[\text{1s}]}{\#[\text{0s or 1s}]} = \frac{\text{# of successes}}{\text{# of attempts}}

    Notice: the mean of 0s and 1s is the proportion of 1s. This is a good reason to use 0 for FALSE/“fail” and 1 for TRUE/“success”.

    plot of chunk unnamed-chunk-2

    In a spreadsheet, you can use the AVERAGE function. You can see a solution spreadsheet here.

    In R, you can use the MEAN function.

    # First, get the csv into the working directory... then...
    data = read.csv("get_mean_0s_1s.csv")
    ## Warning in file(file, "rt"): cannot open file 'get_mean_0s_1s.csv': No
    ## such file or directory
    ## Error in file(file, "rt"): cannot open the connection
    x = data$x
    ## Error in data$x: object of type 'closure' is not subsettable
    xbar = mean(x)
    phat = xbar #because data is 0s and 1s
    round(phat,2)
    ## [1] 0.33

  23. Question

    A sample was gathered.

    𝐱=34.97,34.99,32.15,29.31,34.97,33.07,33.37,29.89,34.84,34.80,31.92\mathbf{x} = 34.97, 34.99, 32.15, 29.31, 34.97, 33.07, 33.37, 29.89, 34.84, 34.80, 31.92

    You can download the data as a CSV file.

    Determine the sample range. Please enter an exact answer.


    Solution

    To determine the range by hand, you subtract the minimum value from the maximum value. You could first sort all the data. sort(𝐱)=29.31,29.89,31.92,32.15,33.07,33.37,34.80,34.84,34.97,34.97,34.99\text{sort}(\mathbf{x}) = 29.31, 29.89, 31.92, 32.15, 33.07, 33.37, 34.80, 34.84, 34.97, 34.97, 34.99 Determine the minimum. min(𝐱)=29.31\text{min}(\mathbf{x}) = 29.31 Determine the maximum. max(𝐱)=34.99\text{max}(\mathbf{x}) = 34.99 Take the difference. range(𝐱)=max(𝐱)min(𝐱)=34.9929.31=5.68\text{range}(\mathbf{x}) = \text{max}(\mathbf{x})-\text{min}(\mathbf{x}) = 34.99-29.31 = 5.68

    Spreadsheet

    In a spreadsheet, you can use the MIN function and MAX function.

    plot of chunk unnamed-chunk-2

    You can see a solution spreadsheet here.

    R

    In R, you can use the MIN and MAX functions.

    # First, get the csv into the working directory... then...
    data = read.csv("get_range.csv")
    x = data$x
    range = max(x)-min(x)
    print(range)
    ## [1] 5.68

  24. Question

    A sample was gathered.

    𝐱=72.65,67.56,71.47,68.35,64.66,63.62,61.59,74.02,66.97,65.00,65.44,69.15,60.46,74.48\mathbf{x} = 72.65, 67.56, 71.47, 68.35, 64.66, 63.62, 61.59, 74.02, 66.97, 65.00, 65.44, 69.15, 60.46, 74.48

    You can download the data as a CSV file.

    Determine the sample’s mean absolute deviation (MAD) around the mean. You can round your answer to the hundredths place.


    Solution

    First, determine the mean of the sample.

    x=𝐱n=67.53\bar{x} = \frac{\sum \mathbf{x}}{n} =67.53

    Determine the absolute deviations (distances from x\bar{x}).

    ii 𝐱\mathbf{x} deviations = 𝐱x\mathbf{x}-\bar{x} AbsDev = |𝐱x|\vert\mathbf{x}-\bar{x}\vert
    1 72.65 5.12 5.12
    2 67.56 0.03 0.03
    3 71.47 3.94 3.94
    4 68.35 0.82 0.82
    5 64.66 -2.87 2.87
    6 63.62 -3.91 3.91
    7 61.59 -5.94 5.94
    8 74.02 6.49 6.49
    9 66.97 -0.56 0.56
    10 65.00 -2.53 2.53
    11 65.44 -2.09 2.09
    12 69.15 1.62 1.62
    13 60.46 -7.07 7.07
    14 74.48 6.95 6.95

    Now, take the mean of the absolute deviations.

    MAD=|𝐱x|n=3.5671429\text{MAD} = \frac{\sum \vert\mathbf{x}-\bar{x}\vert}{n} = 3.5671429

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    R

    You can do this with R

    x = read.csv("get_MAD.csv")$x
    xbar = mean(x)
    deviations = x-xbar
    AbsDev = abs(deviations)
    MAD = mean(AbsDev)
    x
    ##  [1] 72.65 67.56 71.47 68.35 64.66 63.62 61.59 74.02 66.97 65.00 65.44
    ## [12] 69.15 60.46 74.48
    xbar
    ## [1] 67.53
    deviations
    ##  [1]  5.12  0.03  3.94  0.82 -2.87 -3.91 -5.94  6.49 -0.56 -2.53 -2.09
    ## [12]  1.62 -7.07  6.95
    AbsDev
    ##  [1] 5.12 0.03 3.94 0.82 2.87 3.91 5.94 6.49 0.56 2.53 2.09 1.62 7.07
    ## [14] 6.95
    MAD
    ## [1] 3.567143

  25. Question

    A sample was gathered.

    𝐱=65.96,68.45,65.81,69.73,65.14,67.83,61.32,66.27,62.50\mathbf{x} = 65.96, 68.45, 65.81, 69.73, 65.14, 67.83, 61.32, 66.27, 62.50

    You can download the data as a CSV file.

    Determine the biased sample variance (without Bessel correction). You can round your answer to the hundredths place.


    Solution

    First, determine the mean of the sample.

    x=𝐱n=65.89\bar{x} = \frac{\sum \mathbf{x}}{n} =65.89

    Determine the squared deviations (squared distances from x\bar{x}).

    ii 𝐱\mathbf{x} deviations = 𝐱x\mathbf{x}-\bar{x} SqrDev = (𝐱x)2(\mathbf{x}-\bar{x})^2
    1 65.96 0.07 0.0049
    2 68.45 2.56 6.5536
    3 65.81 -0.08 0.0064
    4 69.73 3.84 14.7456
    5 65.14 -0.75 0.5625
    6 67.83 1.94 3.7636
    7 61.32 -4.57 20.8849
    8 66.27 0.38 0.1444
    9 62.50 -3.39 11.4921

    Now, take the mean of the squared deviations.

    VAR=(𝐱x)2n=6.462\text{VAR} = \frac{\sum (\mathbf{x}-\bar{x})^2}{n} = 6.462

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    And, actually, you can skip a lot of work by using the VAR.P function. Using the population variance formula is equivalent to using the biased sample variance formula.

    R

    You can do this with R

    x = read.csv("get_VAR.csv")$x
    xbar = mean(x)
    deviations = x-xbar
    sqrdev = deviations^2
    VAR = mean(sqrdev)
    x
    ## [1] 65.96 68.45 65.81 69.73 65.14 67.83 61.32 66.27 62.50
    xbar
    ## [1] 65.89
    deviations
    ## [1]  0.07  2.56 -0.08  3.84 -0.75  1.94 -4.57  0.38 -3.39
    sqrdev
    ## [1]  0.0049  6.5536  0.0064 14.7456  0.5625  3.7636 20.8849  0.1444
    ## [9] 11.4921
    VAR
    ## [1] 6.462
    # The built-in var() function almost works, but it is too fancy, and makes a Bessel correction. To use it, we need to undo the Bessel correction.
    n = length(x)
    var(x)*(n-1)/(n)
    ## [1] 6.462

  26. Question

    A sample was gathered.

    𝐱=54.13,54.72,53.93,54.16,55.00,53.81,54.50,54.79\mathbf{x} = 54.13, 54.72, 53.93, 54.16, 55.00, 53.81, 54.50, 54.79

    You can download the data as a CSV file.

    Determine the unbiased sample variance (with Bessel correction). You can round your answer to the hundredths place.

    You probably wonder why you would make the Bessel correction. The reason is important. We will see that the main goal of statistics is to infer the underlying probability distribution of a collection of empirical observations. In other words, we have a lottery machine filled with many balls (population), but we only see a small sample of those balls, and our goal is to guess what the population looks like based on a small sample (see statistical inference).

    It turns out that when guessing the population’s variance from a small sample, your guess is better after making the Bessel correction.


    Solution

    First, determine the mean of the sample.

    x=𝐱n=54.38\bar{x} = \frac{\sum \mathbf{x}}{n} =54.38

    Determine the squared deviations (squared distances from x\bar{x}).

    ii 𝐱\mathbf{x} deviations = 𝐱x\mathbf{x}-\bar{x} SqrDev = (𝐱x)2(\mathbf{x}-\bar{x})^2
    1 54.13 -0.25 0.0625
    2 54.72 0.34 0.1156
    3 53.93 -0.45 0.2025
    4 54.16 -0.22 0.0484
    5 55.00 0.62 0.3844
    6 53.81 -0.57 0.3249
    7 54.50 0.12 0.0144
    8 54.79 0.41 0.1681

    Determine the unbiased sample variance by summing the squared deviations and dividing the sum by n1n-1.

    s2=(𝐱x)2n1=1.320881=0.1886857s^2 = \frac{\sum (\mathbf{x}-\bar{x})^2}{n-1} = \frac{1.3208}{8-1} = 0.1886857

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    And, actually, you can skip a lot of work by using the VAR function.

    R

    You can do this with R

    x = read.csv("get_VAR.csv")$x
    var(x)
    ## [1] 0.1886857

    You could also do it the long way.

    x = read.csv("get_VAR.csv")$x
    n = length(x)
    xbar = mean(x)
    deviations = x-xbar
    sqrdev = deviations^2
    VAR = sum(sqrdev)/(n-1)
    VAR
    ## [1] 0.1886857

  27. Question

    A sample was gathered.

    𝐱=50.07,52.02,56.29,53.73,50.38,53.85,50.51,53.98,51.71,50.64,50.05,53.29\mathbf{x} = 50.07, 52.02, 56.29, 53.73, 50.38, 53.85, 50.51, 53.98, 51.71, 50.64, 50.05, 53.29

    You can download the data as a CSV file.

    Determine the uncorrected sample standard deviation (without Bessel correction, biased).

    SDbiased=1n(𝐱x)2\text{SD}_\text{biased} = \sqrt{\frac{1}{n}\sum(\mathbf{x}-\bar{x})^2}

    You can round your answer to the hundredths place.


    Solution

    Determine the mean of the sample.

    x=𝐱n=52.21\bar{x} = \frac{\sum \mathbf{x}}{n} =52.21

    Determine the squared deviations (squared distances from x\bar{x}).

    ii 𝐱\mathbf{x} deviations = 𝐱x\mathbf{x}-\bar{x} SqrDev = (𝐱x)2(\mathbf{x}-\bar{x})^2
    1 50.07 -2.14 4.5796
    2 52.02 -0.19 0.0361
    3 56.29 4.08 16.6464
    4 53.73 1.52 2.3104
    5 50.38 -1.83 3.3489
    6 53.85 1.64 2.6896
    7 50.51 -1.70 2.8900
    8 53.98 1.77 3.1329
    9 51.71 -0.50 0.2500
    10 50.64 -1.57 2.4649
    11 50.05 -2.16 4.6656
    12 53.29 1.08 1.1664

    Find the mean of the squared deviations.

    VARbiased=(𝐱x)2n=3.6817333\text{VAR}_\text{biased} = \frac{\sum (\mathbf{x}-\bar{x})^2}{n} = 3.6817333

    Take the square root of the variance.

    SDbiased=(𝐱x)2n=3.6817333=1.9187843\text{SD}_\text{biased} = \sqrt{\frac{\sum(\mathbf{x}-\bar{x})^2}{n}} = \sqrt{3.6817333} = 1.9187843

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    And, actually, you can skip a lot of work by using the STDEV.P function. Using the population standard deviation formula is equivalent to using the biased sample standard deviation formula.

    R

    You can do this with R

    x = read.csv("get_SD.csv")$x
    xbar = mean(x)
    deviations = x-xbar
    sqrdev = deviations^2
    VAR_biased = mean(sqrdev)
    SD_biased = sqrt(VAR_biased)
    SD_biased
    ## [1] 1.918784
    # This could also be done with a one-liner
    x = read.csv("get_SD.csv")$x
    SD_biased2 = sqrt(mean((x-mean(x))^2))
    SD_biased2
    ## [1] 1.918784
    # The built-in sd() function almost works, but it is too fancy, and makes a Bessel correction. To use it, we need to undo the Bessel correction.
    x = read.csv("get_SD.csv")$x
    n = length(x)
    SD_biased3 = sd(x)*sqrt((n-1)/(n))
    SD_biased3
    ## [1] 1.918784

  28. Question

    A sample was gathered.

    𝐱=20,31,43,39,45,21,37,20,20,45,20\mathbf{x} = 20, 31, 43, 39, 45, 21, 37, 20, 20, 45, 20

    You can download the data as a CSV file.

    Determine the corrected sample standard deviation (with Bessel correction). You can round your answer to the hundredths place.

    You probably wonder why you would make the Bessel correction. The reason is important. We will see that the main goal of statistics is to infer the underlying probability distribution of a collection of empirical observations. In other words, we have a lottery machine filled with many balls (population), but we only see a small sample of those balls, and our goal is to guess what the population looks like based on a small sample (see statistical inference).

    It turns out that when guessing the population’s standard deviation from a small sample, your guess is better after making the Bessel correction.


    Solution

    First, determine the mean of the sample.

    x=𝐱n=31\bar{x} = \frac{\sum \mathbf{x}}{n} =31

    Determine the squared deviations (squared distances from x\bar{x}).

    ii 𝐱\mathbf{x} deviations = 𝐱x\mathbf{x}-\bar{x} SqrDev = (𝐱x)2(\mathbf{x}-\bar{x})^2
    1 20 -11 121
    2 31 0 0
    3 43 12 144
    4 39 8 64
    5 45 14 196
    6 21 -10 100
    7 37 6 36
    8 20 -11 121
    9 20 -11 121
    10 45 14 196
    11 20 -11 121

    Determine the unbiased sample variance by summing the squared deviations and dividing the sum by n1n-1.

    s2=(𝐱x)2n1=1220111=122s^2 = \frac{\sum (\mathbf{x}-\bar{x})^2}{n-1} = \frac{1220}{11-1} = 122

    Determine the corrected sample standard deviation by taking the square root of the unbiased sample variance.

    s=s2=122=11.045361s=\sqrt{s^2}=\sqrt{122}=11.045361

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    And, actually, you can skip a lot of work by using the STDEV function.

    R

    You can do this with R

    x = read.csv("get_SD.csv")$x
    sd(x)
    ## [1] 11.04536

    You could also do it the long way.

    x = read.csv("get_SD.csv")$x
    n = length(x)
    xbar = mean(x)
    deviations = x-xbar
    sqrdev = deviations^2
    VAR = sum(sqrdev)/(n-1)
    SD = sqrt(VAR)
    SD
    ## [1] 11.04536

  29. Question

    A large sample (n=10000n=10000) was gathered and visualized with a histogram.

    plot of chunk unnamed-chunk-1

    When a symmetric distribution is sampled thoroughly (n>100n>100), you can use the following rules of thumb to estimate the mean and standard deviation.

    Shape Estimated mean Estimated standard deviation
    Bell min(x)+max(x)2\frac{\min(x)+\max(x)}{2} max(x)min(x)6\frac{\max(x)-\min(x)}{6}
    Uniform min(x)+max(x)2\frac{\min(x)+\max(x)}{2} max(x)min(x)12\frac{\max(x)-\min(x)}{\sqrt{12}}
    Bimodal min(x)+max(x)2\frac{\min(x)+\max(x)}{2} max(x)min(x)2\frac{\max(x)-\min(x)}{2}

    1. Estimate the mean of the sample. (Please use the given formula.)
    2. Estimate the standard deviation of the sample. (Please use the given formula.)

    Solution

    Notice the bell has the smallest standard deviation for a given range. This is because many measurements are near the mean, and just a few are near the edges.

    The bimodal has the largest standard deviation for a given range. This is because many measurements are near the edges, and just a few are near the mean.

    The uniform distribution has about equal numbers of measurements near the mean and near the edges, so it has a standard deviation between the two other shapes (for a given range).

    You’ll notice the bimodal estimate is probably the worst estimate. The estimate would be more accurate if none of the values were near the mean, and all the values were near the edge.


    1. By using the given formula, you’ll get an estimate of 35 for the mean. The actual mean is 35.0013734.
    2. By using the formula, you’ll get an estimate of 1.6666667 for the standard deviation. The actual standard deviation is 1.67505.

  30. Question

    Three large samples were taken from three different populations. Their distributions are shown as histograms.

    plot of chunk unnamed-chunk-1

    1. Which sample has the smallest mean? X / Y / Z
    2. Which sample has the largest mean? X / Y / Z
    3. Which sample has the smallest standard deviation? X / Y / Z
    4. Which sample has the largest standard deviation? X / Y / Z


    Solution

    All three distributions look like they have similar uniform shape, but their centers and spreads are all different.


    1. The distribution with its center sitting furthest left has the smallest mean. / . / .
    2. The distribution with its center sitting furthest right has the largest mean. / . / .
    3. The least wide distribution has the smallest standard deviation. It is this simple because all distributions have the same shape. / . / .
    4. The most wide distribution has the largest standard deviation. It is this simple because all distributions have the same shape. / . / .

  31. Question

    Three large samples were taken from three different populations. Their distributions are shown as histograms.

    plot of chunk unnamed-chunk-1

    1. Which sample has the smallest mean? X / Y / Z
    2. Which sample has the largest mean? X / Y / Z
    3. Which sample has the smallest standard deviation? X / Y / Z
    4. Which sample has the largest standard deviation? X / Y / Z


    Solution

    All three distributions look like they have similar ranges (widths), but different shapes. So, we will use the fact that for a given range, bell shape has the smallest standard deviation and bimodal has the largest standard deviation. This is because a bell shape has many measurements near the middle, whereas the bimodal shape has many measurements near the edges of its interval.


    1. The distribution with its center sitting furthest left has the smallest mean. / . / .
    2. The distribution with its center sitting furthest right has the largest mean. / . / .
    3. The bell shape has the smallest standard deviation. It is this simple because all the distributions have the same range, but the bell has a higher fraction near its mean. / . / .
    4. The bimodal shape has the largest standard deviation. It is this simple because all the distributions have the same range, but the bimodal shape has a higher fraction far from its mean. / . / .

  32. Question

    Background

    Sometimes a population is well characterized. In this case we know its mean and standard deviation. We use greek letters μ\mu (“mu”) and σ\sigma (“sigma”) when describing the population (instead of the x\bar{x} (“xbar”) and ss used for sample mean and sample standard deviation). μ=population mean\mu = \text{population mean} σ=population standard deviation\sigma = \text{population standard deviation}

    When measuring individuals from a population, we expect most measurements to be within the interval of typical measurements. We will define the interval of typical measurements (using interval notation): interval of typical measurements=(μ2σ,μ+2σ)\text{interval of typical measurements} = (\mu-2\sigma,\,\mu+2\sigma) In other words, we expect we expect most measurements to be between two bounds. lower bound of interval of typical measurements=μ2σ\text{lower bound of interval of typical measurements} = \mu-2\sigma upper bound of interval of typical measurements=μ+2σ\text{upper bound of interval of typical measurements} = \mu+2\sigma

    Actual Question

    A population of lizards has a mean length of μ=77.1\mu = 77.1 cm and a standard deviation of σ=8.3\sigma=8.3 cm. Determine the interval of typical measurements.


    1. Determine the lower bound of the interval of typical measurements.
    2. Determine the upper bound of the interval of typical measurements.

    Solution

    You need to use the formulas. Remember your order of operations!


    1. Use the formula. μ2σ=77.12(8.3)=60.5\mu-2\sigma = 77.1 - 2(8.3) = 60.5
    2. Use the formula. μ+2σ=77.1+2(8.3)=93.7\mu+2\sigma = 77.1 + 2(8.3) = 93.7

  33. Question

    The following spinner has a population mean μ=65\mu = 65 and a population standard deviation σ=2\sigma=2. We can think of a spinner as an infinite population from which we can take as many independent measurements from as we want.

    plot of chunk unnamed-chunk-1

    A sample (200 measurements) was taken.

    65.5914, 69.6591, 62.3152, 62.0706, 64.2983, 66.9836, 67.3504, 64.5599, 64.4641, 65.8275,
    62.4167, 66.1170, 63.8299, 61.2904, 67.3588, 63.3710, 65.2219, 66.6059, 62.7939, 61.4353,
    68.0023, 65.2899, 67.3512, 65.9362, 65.2307, 68.4660, 65.6860, 66.3906, 65.8748, 62.7641,
    62.5626, 67.4819, 67.3861, 65.0441, 63.3612, 67.0488, 66.5450, 65.0290, 64.0181, 66.5035,
    66.9666, 64.7889, 66.4093, 66.8305, 61.2497, 63.8005, 62.5556, 61.5403, 65.5833, 64.6528,
    66.3852, 63.9360, 67.2303, 62.4962, 64.8687, 67.8652, 62.9088, 68.9563, 62.7787, 64.8576,
    63.8498, 63.5096, 65.7014, 66.0777, 63.1418, 64.9178, 65.6313, 66.4512, 68.1289, 67.5895,
    65.9020, 66.0343, 64.1611, 68.5267, 65.0194, 63.8384, 63.3480, 65.3929, 64.9688, 63.9628,
    66.1697, 66.4425, 66.1182, 63.8994, 64.8656, 59.4777, 64.8277, 64.5477, 62.8686, 64.6629,
    64.1792, 62.8685, 65.2674, 65.4781, 62.3302, 63.9020, 65.5931, 63.7468, 70.1651, 67.6230,
    62.7863, 59.7476, 63.8277, 62.8221, 64.5745, 62.7795, 65.4342, 65.2129, 65.8897, 66.0036,
    62.6035, 65.4064, 65.1463, 66.0459, 61.7113, 63.5175, 65.6731, 68.7174, 66.9093, 67.2812,
    70.8647, 67.4793, 68.6700, 68.3615, 63.8713, 65.2969, 66.6215, 61.9097, 64.6685, 63.6505,
    66.3863, 63.7009, 64.4872, 67.5388, 68.6062, 65.5804, 64.6326, 68.7246, 71.2573, 67.8468,
    66.1906, 66.2312, 65.7185, 64.4274, 67.3468, 62.1675, 64.7659, 61.8738, 67.8193, 65.1262,
    68.0571, 63.0963, 62.6560, 61.7648, 66.6956, 69.2242, 67.5693, 65.5108, 61.6663, 65.0250,
    63.8676, 67.8875, 65.0600, 65.7887, 64.6236, 64.1440, 65.4003, 65.6019, 65.6642, 64.6577,
    67.6506, 64.4485, 65.4193, 68.8963, 66.0985, 64.4691, 62.0262, 66.9004, 63.8215, 64.4759,
    65.3226, 61.4894, 68.3864, 68.1937, 63.7190, 69.1519, 64.8021, 69.4722, 64.5362, 64.1693,
    61.4202, 64.7983, 63.6826, 62.4361, 63.5248, 61.8594, 65.0435, 65.7796, 61.2055, 62.0183

    You can download the data as a CSV file.

    Actual question

    What proportion of the 200 measurements are outside the interval of typical measurements?


    Solution

    First, determine the interval of typical measurements.

    interval of typical measurements=(μ2σ,μ+2σ)=(61,69)\begin{aligned} \text{interval of typical measurements} &= (\mu-2\sigma,\,\mu+2\sigma)\\ &= (61,69) \end{aligned}

    Now, determine how many measurements (and divide by 200 for what proportion) are either less than 61 or more than 69. You’ll want to use a computer.

    R

    x = read.csv("check_interval_typical_measurements.csv")$x
    n = length(x)
    count_outside = sum(x<61 | x>69)
    prop_outside = count_outside/n
    print(prop_outside)
    ## [1] 0.045

    In R, the “|” operator means “or”.

    You could also write the equality as an absolute deviation from the mean. Any measurement more than 4 units from 65 (in either direction) would be outside the interval.

    x = read.csv("check_interval_typical_measurements.csv")$x
    n = length(x)
    count_outside = sum( abs(x-65)>4 )
    prop_outside = count_outside/n
    print(prop_outside)
    ## [1] 0.045

    Spreadsheet

    In a spreadsheet you can use the IF function along with the OR function to determine which measurements are under 61 or over 69. You then use the SUM function to count the 1s.

    plot of chunk unnamed-chunk-5

    You can download this solution CSV.

    Another (simpler?) way is to use the COUNTIF function with the ABS function.

    plot of chunk unnamed-chunk-6

    You can download this second solution CSV.


  34. Question

    A sample was gathered.

    𝐱=30.54,45.49,39.71,40.32,49.89,45.38,31.16\mathbf{x} = 30.54, 45.49, 39.71, 40.32, 49.89, 45.38, 31.16

    You can download the data as a CSV file.

    Determine the sample interquartile range (IQR).

    Warning: various definitions of IQR exist, based on arbitrary decisions made in defining the quantile function or other definitions of quartiles. I will make the answer’s tolerance large enough to accept most (hopefully all) methods.


    Solution

    Method 1: method of medians

    This method is described in the wikipedia page on IQR.

    This method relies on first determining the size of each half.

    You determine the medians of the lowest half of the values and the highest half of the values. The IQR is the difference of those medians.

    In this case, n=7n=7, so Q1Q_1 is the median of the lowest 3 numbers and Q3Q_3 is the median of the highest 3 numbers.

    Method 1 by hand

    Method 1 is easiest to do by hand.

    sort(𝐱)=30.54,31.16,39.71,40.32,45.38,45.49,49.89\text{sort}(\mathbf{x}) = 30.54, 31.16, 39.71, 40.32, 45.38, 45.49, 49.89

    Because there are 7 values, the first quartile is the median of the lowest 3 values and the third quartile is the median of the highest 3 values.

    Q1=median(30.54,31.16,39.71)=31.16Q_1 = \text{median}(30.54, 31.16, 39.71) = 31.16 Q3=median(45.38,45.49,49.89)=45.49Q_3 = \text{median}(45.38, 45.49, 49.89) = 45.49

    The IQR is the difference between Q3Q_3 and Q1Q_1. IQR=45.4931.16=14.33\text{IQR} = 45.49-31.16 = 14.33

    Method 1 with spreadsheet

    Unfortunately, the built-in QUARTILE function does not use the method of medians (more about this in Method 2).

    We sort the data, determine the median of the lowest 3 values, determine the median of the highest 3 values, and take a difference.

    You can see a solution spreadsheet.

    plot of chunk unnamed-chunk-4

    Method 1 with R

    Again, the built-in function does not follow the method of medians. So, Method 1 is actually kind of difficult with R. The following code should be relatively easy to understand… but it uses floor rounding, subsetting and the colon operator.

    data = read.csv("get_IQR.csv")
    x = data$x
    n = length(x)
    x_sorted = sort(x)
    halfsize = floor(n/2) 
    Q1 = median(x_sorted[1:halfsize])
    Q3 = median(x_sorted[(n-halfsize+1):n])
    iqr = Q3-Q1
    print(iqr)
    ## [1] 14.33

    Method 2: built-in functions

    Method 2 with spreadsheet

    We can find the quartiles with built-in functions of a spreadsheet.

    plot of chunk unnamed-chunk-6

    Download this solution

    You’ll notice this gives a different answer than Method 1.

    Method 2 with R

    There are 9 different built-in methods in R.

    # Personally, I like the 5th option... the default is 7... some smart researchers suggest 8...
    x = read.csv("get_IQR.csv")$x
    IQR1 = IQR(x,type=1)
    IQR2 = IQR(x,type=2)
    IQR3 = IQR(x,type=3)
    IQR4 = IQR(x,type=4)
    IQR5 = IQR(x,type=5)
    IQR6 = IQR(x,type=6)
    IQR7 = IQR(x,type=7)
    IQR8 = IQR(x,type=8)
    IQR9 = IQR(x,type=9)
    cat(c(IQR1,IQR2,IQR3,IQR4,IQR5,IQR6,IQR7,IQR8,IQR9))
    ## 14.33 14.33 14.22 14.4025 12.165 14.33 10 12.88667 12.70625
    #The default is type 7...
    x = read.csv("get_IQR.csv")$x
    IQR_default = IQR(x)
    IQR_default
    ## [1] 10

    TMI

    To really understand what is happening, I think it helps to visualize the quantile functions. Let’s use types 1, 5, and 7. Also, remember the sorted sample:

    ## 30.54 31.16 39.71 40.32 45.38 45.49 49.89

    Type 1 is based on the empirical cumululative distribution.

    plot of chunk unnamed-chunk-10

    Types 5 and 7 are based on continuous versions of the empirical cumulative distribution.

    plot of chunk unnamed-chunk-11

    plot of chunk unnamed-chunk-12

    TL;DR

    Any of the following answers are accepted:

    ## 14.33 10 14.33 14.33 14.22 14.4025 12.165 14.33 10 12.88667 12.70625

  35. Question

    A sample was gathered.

    𝐱=0,0,1,1,1,1,0,0\mathbf{x} = 0, 0, 1, 1, 1, 1, 0, 0

    You can download the data as a CSV file.

    Determine the sample’s mean absolute deviation (MAD) around the sample proportion. You can round your answer to the hundredths place.


    Solution

    First, determine the sample proportion (mean of 0s and 1s).

    p̂=48=0.5\hat{p} = \frac{4}{8} =0.5

    Determine the absolute deviations (distances from p̂\hat{p}).

    ii 𝐱\mathbf{x} deviations = 𝐱p̂\mathbf{x}-\hat{p} AbsDev = |𝐱p̂|\vert\mathbf{x}-\hat{p}\vert
    1 0 -0.5 0.5
    2 0 -0.5 0.5
    3 1 0.5 0.5
    4 1 0.5 0.5
    5 1 0.5 0.5
    6 1 0.5 0.5
    7 0 -0.5 0.5
    8 0 -0.5 0.5

    Now, take the mean of the absolute deviations.

    MADprop=|𝐱p̂|n=0.5\text{MAD}_\text{prop} = \frac{\sum \vert\mathbf{x}-\hat{p}\vert}{n} = 0.5

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    R

    You can do this with R

    x = read.csv("get_MAD.csv")$x
    phat = mean(x)
    deviations = x-phat
    AbsDev = abs(deviations)
    MAD = mean(AbsDev)
    x
    ## [1] 0 0 1 1 1 1 0 0
    phat
    ## [1] 0.5
    deviations
    ## [1] -0.5 -0.5  0.5  0.5  0.5  0.5 -0.5 -0.5
    AbsDev
    ## [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
    MAD
    ## [1] 0.5

    Algebra

    With a little bit of algebra, we can simplify the formula for MADprop\text{MAD}_\text{prop}. (I’ve not shown the algebra…)

    MADprop=2(1p̂)p̂\text{MAD}_\text{prop} = 2(1-\hat{p})\hat{p}

    In other words, if n0n_0 is the number of 0s and n1n_1 is the number of 1s, then MADprop=2n0nn1n\text{MAD}_\text{prop} = 2 \cdot \frac{n_0}{n} \cdot \frac{n_1}{n} So, in this case: MADprop=24848=0.5\text{MAD}_\text{prop} = 2 \cdot \frac{4}{8}\cdot \frac{4}{8} = 0.5


  36. Question

    A sample was gathered.

    𝐰=0,0,0,0,0,1,0,1,0,1\mathbf{w} = 0, 0, 0, 0, 0, 1, 0, 1, 0, 1

    You can download the data as a CSV file.

    Determine the variance of the sample. You can round your final answer to the hundredths place.


    Solution

    First, determine the mean of the sample. Remember, if a sample is all 0s and 1s, then the sample’s mean is the same as the sample proportion (p̂\hat{p}).

    p̂=w=𝐰n=0.3\hat{p} = \bar{w} = \frac{\sum \mathbf{w}}{n} =0.3

    Determine the squared deviations (squared distances from p̂\hat{p}).

    ii 𝐰\mathbf{w} deviations = 𝐰p̂\mathbf{w}-\hat{p} SqrDev = (𝐰p̂)2(\mathbf{w}-\hat{p})^2
    1 0 -0.3 0.09
    2 0 -0.3 0.09
    3 0 -0.3 0.09
    4 0 -0.3 0.09
    5 0 -0.3 0.09
    6 1 0.7 0.49
    7 0 -0.3 0.09
    8 1 0.7 0.49
    9 0 -0.3 0.09
    10 1 0.7 0.49

    Now, take the mean of the squared deviations to determine the variance.

    VAR=i=1n(wiw)2n=i=1n(wip̂)2n=0.21\text{VAR} = \frac{\sum\limits_{i=1}^n (w_i-\bar{w})^2}{n} = \frac{\sum\limits_{i=1}^n (w_i-\hat{p{}})^2}{n} = 0.21

    IMPORTANT NOTE (notation and central limit theorem):

    I will usually use 𝐰\mathbf{w} (not conventional) to represent raw data of 0s and 1s (instead of 𝐱\mathbf{x}), because in the context of 0s and 1s, xix_i usually implies the number of successes in nn trials. xi=j=1nwi,jx_i= \sum_{j=1}^{n} w_{i,j} When xix_i represents a count of successes in nn (independent) trials, a large sample of counts (𝐱\mathbf{x}) follows a binomial distribution, which is a special case of more general distributions of sums or means, which tend to be normally distributed.

    The Central Limit Theorem states that random averages (means) and random sums follow normal probability distributions. The expected value and standard deviation of the sampling distribution is either calculated from the underlying distribution’s parameters or guessed from a sample’s statistics.

    Notice, when dealing with binomial distributions, the underlying Bernoulli distribution is rarely discussed, and even when it is, “ww” is not used. So, this notation is not conventional. The conventional notation for w\bar{w} is p̂\hat{p}, where the hat denotes “estimated”, because sample proportion p̂\hat{p} is an estimate of the underlying population proportion pp.

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    And, actually, you can skip a lot of work by using the VAR.P function. Notice, you use the population function even though the data is a sample. This is because, with proportions, the mean and variance are intrinsically linked (not independent).

    In fact, we will see that the sample variance of 0s and 1s can be calculated from the sample proportion.

    VARbernoulli=p̂(1p̂)\text{VAR}_{\text{bernoulli}} = \hat{p}(1-\hat{p})

    R

    You can do this with R

    w = read.csv("get_VAR.csv")$w
    phat = mean(w)
    deviations = w-phat
    sqrdev = deviations^2
    VAR = mean(sqrdev)
    w
    ##  [1] 0 0 0 0 0 1 0 1 0 1
    phat
    ## [1] 0.3
    deviations
    ##  [1] -0.3 -0.3 -0.3 -0.3 -0.3  0.7 -0.3  0.7 -0.3  0.7
    sqrdev
    ##  [1] 0.09 0.09 0.09 0.09 0.09 0.49 0.09 0.49 0.09 0.49
    VAR
    ## [1] 0.21
    # The built-in var() function almost works, but it is too fancy, and makes a Bessel correction. To use it, we need to undo the Bessel correction.
    n = length(w)
    var(w)*(n-1)/(n)
    ## [1] 0.21

    Algebra

    As mentioned earlier, there is an intrinsic link between p̂\hat{p} (the mean of 0s and 1s) and ss (the variance of 0s and 1s).

    Let n0n_0 represent the number of 0s and n1n_1 represent the number of ones. p̂=n1n0+n1\hat{p} = \frac{n_1}{n_0+n_1} We can find a simple formula for variance (when data is 0s and 1s [Bernoulli]). VARbern=(𝐰p̂)2n0+n1=n0(p̂0)2+n1(1p̂)2n0+n1=n0(n1n0+n1)2+n1(1n1n0+n1)2n0+n1=n0(n1n0+n1)2+n1(n0n0+n1)2n0+n1=n0n0+n1n1n0+n1=p̂(1p̂)\begin{aligned}\text{VAR}_\text{bern} &= \frac{\sum(\mathbf{w}-\hat{p})^2}{n_0+n_1}\\\\ &= \frac{n_0(\hat{p}-0)^2+n_1(1-\hat{p})^2}{n_0+n_1}\\\\ &= \frac{n_0\left(\frac{n_1}{n_0+n_1}\right)^2+n_1\left(1-\frac{n_1}{n_0+n_1}\right)^2}{n_0+n_1}\\\\ &= \frac{n_0\left(\frac{n_1}{n_0+n_1}\right)^2+n_1\left(\frac{n_0}{n_0+n_1}\right)^2}{n_0+n_1}\\\\ &= \frac{n_0}{n_0+n_1}\cdot\frac{n_1}{n_0+n_1}\\\\ &= \hat{p}(1-\hat{p}) \end{aligned}

    So, you could have just found p̂\hat{p}. p̂=n1n0+n1=310=0.3\hat{p} = \frac{n_1}{n_0+n_1} = \frac{3}{10} = 0.3 Then used the formula. VARbern=p̂(1p̂)=(0.3)(10.3)=0.21\text{VAR}_\text{bern} =\hat{p}(1-\hat{p}) = (0.3)(1-0.3) = 0.21


  37. Question

    A sample was gathered.

    𝐰=1,1,0,1,1,0,0,1,0,1,0,0,0,0,0,0\mathbf{w} = 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0

    You can download the data as a CSV file.

    Determine the standard deviation of the sample. Because the data are all 0s and 1s, you would never make Bessel’s correction (even if guessing the population’s standard deviation), because the mean and standard deviation are not independent parameters. You can round your final answer to the hundredths place.


    Solution

    First, determine the mean of the sample. Remember, if a sample is all 0s and 1s, then the sample’s mean is the same as the sample proportion (p̂\hat{p}).

    p̂=w=𝐰n=0.375\hat{p} = \bar{w} = \frac{\sum \mathbf{w}}{n} =0.375

    Determine the squared deviations (squared distances from p̂\hat{p}).

    ii 𝐰\mathbf{w} deviations = 𝐰p̂\mathbf{w}-\hat{p} SqrDev = (𝐰p̂)2(\mathbf{w}-\hat{p})^2
    1 1 0.625 0.390625
    2 1 0.625 0.390625
    3 0 -0.375 0.140625
    4 1 0.625 0.390625
    5 1 0.625 0.390625
    6 0 -0.375 0.140625
    7 0 -0.375 0.140625
    8 1 0.625 0.390625
    9 0 -0.375 0.140625
    10 1 0.625 0.390625
    11 0 -0.375 0.140625
    12 0 -0.375 0.140625
    13 0 -0.375 0.140625
    14 0 -0.375 0.140625
    15 0 -0.375 0.140625
    16 0 -0.375 0.140625

    Now, take the mean of the squared deviations to determine the variance.

    VAR=i=1n(wiw)2n=i=1n(wip̂)2n=0.234375\text{VAR} = \frac{\sum\limits_{i=1}^n (w_i-\bar{w})^2}{n} = \frac{\sum\limits_{i=1}^n (w_i-\hat{p{}})^2}{n} = 0.234375 The standard deviation is the square root of the variance.

    s=0.234375=0.48412s = \sqrt{0.234375} = 0.48412

    IMPORTANT NOTE (notation and central limit theorem):

    I will usually use 𝐰\mathbf{w} (not conventional) to represent raw data of 0s and 1s (instead of 𝐱\mathbf{x}), because in the context of 0s and 1s, xix_i usually implies the number of successes in nn trials. xi=j=1nwi,jx_i= \sum_{j=1}^{n} w_{i,j} When xix_i represents a count of successes in nn (independent) trials, a large sample of counts (𝐱\mathbf{x}) follows a binomial distribution, which is a special case of more general distributions of sums or means, which tend to be normally distributed.

    The Central Limit Theorem states that random averages (means) and random sums follow normal probability distributions. The expected value and standard deviation of the sampling distribution is either calculated from the underlying distribution’s parameters or guessed from a sample’s statistics.

    Notice, when dealing with binomial distributions, the underlying Bernoulli distribution is rarely discussed, and even when it is, “ww” is not used. So, this notation is not conventional. The conventional notation for w\bar{w} is p̂\hat{p}, where the hat denotes “estimated”, because sample proportion p̂\hat{p} is an estimate of the underlying population proportion pp.

    spreadsheet

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-3

    And, actually, you can skip a lot of work by using the STDEV.P function. Notice, you use the population function even though the data is a sample. This is because, with proportions, the mean and standard deviation are intrinsically linked (not independent).

    In fact, we will see that the sample standard deviation of 0s and 1s can be calculated from the sample proportion.

    sbern=p̂(1p̂)s_{\text{bern}} = \sqrt{\hat{p}(1-\hat{p})}

    R

    You can do this with R

    w = read.csv("get_SD.csv")$w
    phat = mean(w)
    deviations = w-phat
    sqrdev = deviations^2
    VAR = mean(sqrdev)
    SD = sqrt(VAR)
    w
    ##  [1] 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0
    phat
    ## [1] 0.375
    deviations
    ##  [1]  0.625  0.625 -0.375  0.625  0.625 -0.375 -0.375  0.625 -0.375
    ## [10]  0.625 -0.375 -0.375 -0.375 -0.375 -0.375 -0.375
    sqrdev
    ##  [1] 0.390625 0.390625 0.140625 0.390625 0.390625 0.140625 0.140625
    ##  [8] 0.390625 0.140625 0.390625 0.140625 0.140625 0.140625 0.140625
    ## [15] 0.140625 0.140625
    VAR
    ## [1] 0.234375
    SD
    ## [1] 0.4841229
    # The built-in sd() function almost works, but it is too fancy, and makes a Bessel correction. To use it, we need to undo the Bessel correction.
    n = length(w)
    sd(w)*sqrt((n-1)/(n))
    ## [1] 0.4841229

    Algebra

    As mentioned earlier, there is an intrinsic link between p̂\hat{p} (the mean of 0s and 1s) and ss (the variance of 0s and 1s).

    Let n0n_0 represent the number of 0s and n1n_1 represent the number of ones. p̂=n1n0+n1\hat{p} = \frac{n_1}{n_0+n_1} We can find a simple formula for variance (when data is 0s and 1s [Bernoulli]). VARbern=(𝐰p̂)2n0+n1=n0(p̂0)2+n1(1p̂)2n0+n1=n0(n1n0+n1)2+n1(1n1n0+n1)2n0+n1=n0(n1n0+n1)2+n1(n0n0+n1)2n0+n1=n0n0+n1n1n0+n1=p̂(1p̂)\begin{aligned}\text{VAR}_\text{bern} &= \frac{\sum(\mathbf{w}-\hat{p})^2}{n_0+n_1}\\\\ &= \frac{n_0(\hat{p}-0)^2+n_1(1-\hat{p})^2}{n_0+n_1}\\\\ &= \frac{n_0\left(\frac{n_1}{n_0+n_1}\right)^2+n_1\left(1-\frac{n_1}{n_0+n_1}\right)^2}{n_0+n_1}\\\\ &= \frac{n_0\left(\frac{n_1}{n_0+n_1}\right)^2+n_1\left(\frac{n_0}{n_0+n_1}\right)^2}{n_0+n_1}\\\\ &= \frac{n_0}{n_0+n_1}\cdot\frac{n_1}{n_0+n_1}\\\\ &= \hat{p}(1-\hat{p}) \end{aligned}

    So, you could have just found p̂\hat{p}. p̂=n1n0+n1=616=0.375\hat{p} = \frac{n_1}{n_0+n_1} = \frac{6}{16} = 0.375 Then used the formula. sbern=p̂(1p̂)=(0.375)(10.375)=0.4841229s_\text{bern} = \sqrt{\hat{p}(1-\hat{p})} = \sqrt{(0.375)(1-0.375)} = 0.4841229


  38. Question

    A sample was taken from an unknown population. The values were organized into a boxplot.

    plot of chunk unnamed-chunk-1

    For simplicity, assume no measurements lie on the hinges, median, or whisker tips (so we do not worry about inclusive vs. exclusive boundaries). This assumption is approximately true with a very large sample from a continuous distribution.


    1. What proportion of measurements are below 84.9? prop[𝐱<84.9]=?\text{prop}[\mathbf{x}< 84.9] = \text{?}
    2. What proportion of measurements are above 60? prop[𝐱>60]=?\text{prop}[\mathbf{x}> 60] = \text{?}
    3. What proportion of measurements are between 79.2 and 84.9? prop[79.2<𝐱<84.9]=?\text{prop}[79.2< \mathbf{x} < 84.9] = \text{?}
    4. What proportion of measurements are closer than 3.3 units from 63.3? prop[|𝐱63.3|<3.3]=?\text{prop}[|\mathbf{x}-63.3|<3.3] = \text{?}
    5. Determine boundary bb such that prop[𝐱<b]=0.25\text{prop}[\mathbf{x}<b]=0.25.
    6. Determine boundary bb such that prop[𝐱>b]=0.5\text{prop}[\mathbf{x}>b]=0.5.
    7. What is the range?
    8. What is the interquartile range (IQR)?
    9. What is the median?

    Solution

    You need to know that each region (whisker or half-box) contains 25% of the measurements.


    1. 1
    2. 1
    3. 0.25
    4. 0.5
    5. 60.9
    6. 66.6
    7. Subtract min from max. 84.960=24.984.9-60 = 24.9
    8. Subtract Q1Q_1 from Q3Q_3. 79.260.9=18.379.2-60.9 = 18.3
    9. 66.6

  39. Question

    Five different populations were sampled, and the measurements were visualized as five boxplots. (Note: typical boxplots indicate outliers with dots. For simplicity, these boxplots include all outliers in the whiskers.)

    plot of chunk unnamed-chunk-1

    1. Which sample contains the largest value? V / W / X / Y / Z
    2. Which sample contains the smallest value? V / W / X / Y / Z
    3. Which sample has the largest median? V / W / X / Y / Z
    4. Which sample has the smallest median? V / W / X / Y / Z
    5. Which sample has the largest IQR? V / W / X / Y / Z
    6. Which sample has the smallest IQR? V / W / X / Y / Z


    Solution

    1. You find the sample with a whisker that goes furthest to the right.
    2. You find the sample with a whisker that goes furthest to the left.
    3. You find the sample with the median (thick line in middle of box) furthest to the right.
    4. You find the sample with the median (thick line in middle of box) furthest to the left.
    5. You find the sample with the widest box.
    6. You find the sample with the narrowest box.


  40. Question

    Match the five boxplots with their appropriate description.

    plot of chunk unnamed-chunk-1


    1. Bell (Enter an integer between 1 and 5)
    2. Right-skew (Enter an integer between 1 and 5)
    3. Bimodal (Enter an integer between 1 and 5)
    4. Uniform (Enter an integer between 1 and 5)
    5. Left-skew (Enter an integer between 1 and 5)

    Solution


    1. 5
    2. 3
    3. 2
    4. 4
    5. 1

  41. Question

    A continuous random variable (spinner/random number generator/infinite population) can be visualized with a density curve, a spinner, and a cumulative curve.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    plot of chunk unnamed-chunk-3


    1. Evaluate prop[𝐱<70]\text{prop}[\mathbf{x}<70]
    2. Evaluate prop[𝐱>68]\text{prop}[\mathbf{x}>68]
    3. Evaluate prop[|𝐱70|<8]\text{prop}[|\mathbf{x}-70|<8]
    4. Determine integer bb such that prop[𝐱<b]=0.42\text{prop}[\mathbf{x}<b]=0.42
    5. Determine integer bb such that prop[𝐱>b]=0.42\text{prop}[\mathbf{x}>b]=0.42
    6. Determine integer rr such that prop[|𝐱74|<r]=0.16\text{prop}[|\mathbf{x}-74|<r]=0.16

    Solution

    For each problem, you can use any of the visualizations. In short, the answers:

    ## 0.5 0.52 0.64 66 74 2

    1. The answer is 0.5 because prop[𝐱<70]=0.5\text{prop}[\mathbf{x}<70]=0.5. The following visualizations show this. The density curve can be used by counting percent boxes. Each box adds 0.01 to the proportion. You may need to count half boxes, or find partial boxes that add to whole boxes.
      plot of chunk unnamed-chunk-5  
      On the spinner, you can determine the size of a region by using the outside tickmarks.
      plot of chunk unnamed-chunk-6  
      You need to read coordinates to use the cumulative curve.
      plot of chunk unnamed-chunk-7  
    2. The answer is 0.52 because prop[𝐱>68]=0.52\text{prop}[\mathbf{x}>68]=0.52. The following visualizations show this. The density curve can be used by counting percent boxes. Each box adds 0.01 to the proportion. You may need to count half boxes, or find partial boxes that add to whole boxes.
      plot of chunk unnamed-chunk-8  
      On the spinner, you can determine the size of a region by using the outside tickmarks. 10.48=0.521-0.48=0.52 plot of chunk unnamed-chunk-9  
      You need to read coordinates to use the cumulative curve. 10.48=0.521-0.48=0.52 plot of chunk unnamed-chunk-10  
    3. The answer is 0.64 because prop[|𝐱70|<8]=0.64\text{prop}[|\mathbf{x}-70|<8]=0.64. It helps to point out this interval has a center of 70 and a radius of 8, and thus a lower bound of 62 and an upper bound of 78. The following visualizations show this. The density curve can be used by counting percent boxes. Each box adds 0.01 to the proportion. You may need to count half boxes, or find partial boxes that add to whole boxes.
      plot of chunk unnamed-chunk-11  
      On the spinner, you can determine the size of a region by using the outside tickmarks. The lower bound is at x=62x=62, which corresponds to a cumulative proportion of 0.18. The upper bound is at x=78x=78, which corresponds to a cumulative proportion of 0.82. 0.820.18=0.640.82-0.18=0.64 plot of chunk unnamed-chunk-12  
      You need to read coordinates to use the cumulative curve. The lower bound is at x=62x=62, which corresponds to a cumulative proportion of 0.18. The upper bound is at x=78x=78, which corresponds to a cumulative proportion of 0.82. 0.820.18=0.640.82-0.18=0.64 plot of chunk unnamed-chunk-13  
    4. The answer is 66 because prop[𝐱<66]=0.42\text{prop}[\mathbf{x}<66]=0.42. The following visualizations show this. The density curve can be used by counting percent boxes. Each box adds 0.01 to the proportion. You may need to count half boxes, or find partial boxes that add to whole boxes.
      plot of chunk unnamed-chunk-14  
      On the spinner, you can determine the size of a region by using the outside tickmarks.
      plot of chunk unnamed-chunk-15  
      You need to read coordinates to use the cumulative curve.
      plot of chunk unnamed-chunk-16  
    5. The answer is 74 because prop[𝐱>74]=0.42\text{prop}[\mathbf{x}>74]=0.42. The following visualizations show this. The density curve can be used by counting percent boxes. Each box adds 0.01 to the proportion. You may need to count half boxes, or find partial boxes that add to whole boxes.
      plot of chunk unnamed-chunk-17  
      On the spinner, you can determine the size of a region by using the outside tickmarks. 10.58=0.421-0.58=0.42 plot of chunk unnamed-chunk-18  
      You need to read coordinates to use the cumulative curve. 10.58=0.421-0.58=0.42 plot of chunk unnamed-chunk-19  
    6. The answer is 2 because prop[|𝐱74|<2]=0.16\text{prop}[|\mathbf{x}-74|<2]=0.16. You will need to guess and check to arrive at this answer. The following shows how to check after making the correct guess. This interval has a center of 74 and a radius of 2, and thus a lower bound of 72 and an upper bound of 76. The density curve can be used by counting percent boxes. Each box adds 0.01 to the proportion. You may need to count half boxes, or find partial boxes that add to whole boxes.
      plot of chunk unnamed-chunk-20  
      On the spinner, you can determine the size of a region by using the outside tickmarks. The lower bound is at x=72x=72, which corresponds to a cumulative proportion of 0.52. The upper bound is at x=76x=76, which corresponds to a cumulative proportion of 0.68. 0.680.52=0.160.68-0.52=0.16 plot of chunk unnamed-chunk-21  
      You need to read coordinates to use the cumulative curve. The lower bound is at x=72x=72, which corresponds to a cumulative proportion of 0.52. The upper bound is at x=76x=76, which corresponds to a cumulative proportion of 0.68. 0.680.52=0.160.68-0.52=0.16 plot of chunk unnamed-chunk-22

  42. Question

    Background

    In statistics, the word “normal” does not mean “typical”. Instead, “normal” refers to a very important continuous distribution: the normal distribution. Normal distributions are important because random sums and random averages are approximately normal (see central limit theorem). For example, if you repeatedly roll 100 dice, taking the sum of each 100, those sums will be normally distributed (even though single rolls are discrete-uniformly distributed).

    A normal distribution has a bell-shaped density curve. The center and spread of the bell are dictated by two parameters: mean (μ\mu) and standard deviation (σ\sigma). The normal density curve is defined by the following equation: normal density as function of x=1σ2πe12(xμσ)2\text{normal density as function of x} = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} where μ\mu is the mean, σ\sigma is the standard deviation, π\pi is the ratio of a circle’s circumference to its diameter, and ee is Euler’s number.

    For the first exam, we are expected to know the empirical rule. We are going to round unconventionally, so for us it could be called the “68-96-100 rule”. It implies that in a normal distribution 68% of the measurements are within 1 standard deviation of the mean, 96% of the measurements are within 2 standard deviations of the mean, and 100% of the measurements are within 3 standard deviations of the mean.

    We can visualize the “68-96-100” rule with a density curve. Notice the area of each region is shown, and you can estimate the areas by counting percentage boxes.

    plot of chunk unnamed-chunk-1

    We can display our 68-96-100 rule with a spinner.

    plot of chunk unnamed-chunk-2

    And, we can display our 68-96-100 rule with a cumulative curve. In this case, we will introduce the notation of using zz as the multiplier of σ\sigma. For example, if z=2z=-2, then the measurement is μ2σ\mu-2\sigma. We call zz the standard score. plot of chunk unnamed-chunk-3

    Actual Question:

    Population XX (with infinite individuals) has measurements that are normally distributed with mean μ=50\mu=50 and standard deviation σ=10\sigma=10. Use the empirical rule (68-96-100 rule) to answer the following questions.


    1. Evaluate prop[X<40]\text{prop}[X<40].
    2. Evaluate prop[X>50]\text{prop}[X>50].
    3. Evaluate prop[|X50|<20]\text{prop}[|X-50|<20].
    4. Determine boundary bb such that prop[X<b]=0.84\text{prop}[X<b]=0.84.
    5. Determine boundary bb such that prop[X>b]=0.02\text{prop}[X>b]=0.02.
    6. Determine radius rr such that prop[|X50|<r]=0.68\text{prop}[|X-50|<r]=0.68.

    Solution

    It helps to draw a diagram using the supplied mean (μ=50\mu=50) and standard deviation (σ=10\sigma=10).

    plot of chunk unnamed-chunk-4

    plot of chunk unnamed-chunk-5

    plot of chunk unnamed-chunk-6


    1. The answer is 0.16 because prop[X<40]=0.16\text{prop}[X<40]=0.16.
      plot of chunk unnamed-chunk-7  
      plot of chunk unnamed-chunk-8  
      plot of chunk unnamed-chunk-9
    2. The answer is 0.5 because prop[X>50]=0.5\text{prop}[X>50]=0.5. plot of chunk unnamed-chunk-10  
      plot of chunk unnamed-chunk-11  
      plot of chunk unnamed-chunk-12
    3. The answer is 0.96 because prop[|X50|<20]=0.96\text{prop}[|X-50|<20]=0.96. plot of chunk unnamed-chunk-13  
      plot of chunk unnamed-chunk-14  
      plot of chunk unnamed-chunk-15
    4. The answer is 60 because prop[X<60]=0.84\text{prop}[X<60]=0.84. plot of chunk unnamed-chunk-16  
      plot of chunk unnamed-chunk-17  
      plot of chunk unnamed-chunk-18
    5. The answer is 70 because prop[X>70]=0.02\text{prop}[X>70]=0.02. plot of chunk unnamed-chunk-19  
      plot of chunk unnamed-chunk-20  
      plot of chunk unnamed-chunk-21
    6. The answer is 10 because prop[|X50|<10]=0.68\text{prop}[|X-50|<10]=0.68. plot of chunk unnamed-chunk-22  
      plot of chunk unnamed-chunk-23  
      plot of chunk unnamed-chunk-24

  43. Question

    Background

    A standard score (zz) can be calculated from a measurement (xx), the population mean (μ\mu), and the population standard deviation (σ\sigma). z=xμσz = \frac{x-\mu}{\sigma} With a little algebra, you can create another formula solved for the measurement. x=μ+zσx = \mu+z\sigma

    Actual question

    A gambler is interested in the sum of 160 rolls of 4-sided dice. The process of summing 160 rolls of 4-sided dice can be repeated infinitely many times, giving independent results each time, so those sums can be thought of as an infinitely large population. This population happens to be approximately normal (see central limit theorem).

    The gambler knows how to calculate the population mean (see discrete-uniform distribution). μ=1601+42=400\mu = 160\cdot\frac{1+4}{2} = 400 She also knows how to calculate the population standard deviation. σ=16042112=14.14\sigma = \sqrt{160} \cdot \sqrt{\frac{4^2-1}{12}} = 14.14


    1. If the gambler rolled a sum of 419 (in other words got a measurement x=419x=419), what is the standard score? (Determine zz; you can round to the nearest hundredth.)
    2. If the gambler got a standard score of z=0.21z=0.21, what is the sum? (Determine xx, rounding to the nearest integer.)

    Solution

    1. Use the formula. z=xμσ=41940014.14=1.34z = \frac{x-\mu}{\sigma} = \frac{419-400}{14.14} = 1.34
    2. Use the formula. x=μ+zσ=400+(0.21)(14.14)=403x = \mu+z\sigma = 400+(0.21)(14.14) = 403

  44. Question

    A sample’s statistic typically approaches its corresponding population parameter, but with a small sample size there is regularly error. Let’s explore this idea with an example.

    A geometric distribution is a well-studied discrete population. The spinner below represents a geometric distribution with the following population parameters. μ=5.01\mu = 5.01 σ=5.49\sigma = 5.49

    plot of chunk unnamed-chunk-1

    That spinner was spun many times. The raw data is displayed below and can be downloaded as a CSV.

    11, 1, 1, 17, 3, 1, 18, 2, 3, 1, 0, 6, 2, 12, 16, 4, 5, 17, 9, 0, 11, 0, 0, 2, 12, 6, 1, 0, 9, 0, 4, 10, 0, 4, 16, 1, 2, 2, 2, 10, 0, 3, 8, 4, 8, 4, 9, 0, 53, 27, 0, 5, 0

    Please calculate the following sample statistics.


    1. What is the sample size? (Determine nn.)
    2. What is the sample mean? (Determine x\bar{x}. You can round to nearest hundredth.)
    3. What is the absolute difference between the sample mean and population mean? (Calculate |xμ||\bar{x}-\mu|. You can round to nearest hundredth.)
    4. What is the sample median?
    5. What is the sample variance? (Determine s2s^2. Use Bessel’s correction. You can round to nearest hundredth.)
    6. What is the sample standard deviation? (Determine ss. Use Bessel’s correction. You can round to nearest hundredth.)
    7. What is the absolute difference between the sample standard deviation and population standard deviation? (Calculate |sσ||s-\sigma|. You can round to nearest hundredth.)

    Solution

    Notice the sample mean and sample standard deviation do not match the population mean and population standard deviation exactly.


    1. 53
    2. 6.4528302
    3. 1.4428302
    4. 4
    5. 78.9448476
    6. 8.8850913
    7. 3.3950913

  45. Question

    A normal distribution is a well-studied continuous population. The spinner below represents a normal distribution with the following population parameters. μ=50\mu = 50 σ=8\sigma = 8

    plot of chunk unnamed-chunk-1

    That spinner was spun many times. The raw data is displayed below and can be downloaded as a CSV.

    46.1334, 46.5424, 47.8209, 35.8491, 41.2869, 51.1657, 46.4394, 44.1339, 43.8125, 44.9495, 44.8426, 60.0421, 51.0485, 55.4653, 55.2529, 37.767, 54.026, 63.5898, 34.3265, 47.3499, 52.8916, 46.2517, 45.3903, 64.2519, 40.937, 60.1959, 53.3353, 53.0661, 49.5026, 58.5484, 55.6279, 52.2783, 36.2523, 55.7074, 34.5496, 37.9945, 62.39, 67.77, 37.5741, 39.4837, 43.6251, 33.3043, 51.8723, 49.5009, 47.3149, 60.925, 59.945, 51.1887

    Please calculate the following sample proportions. (All answers can be rounded to nearest hundredth.)


    1. Calculate prop[𝐱<47]\text{prop}[\mathbf{x}<47].
    2. Calculate prop[𝐱>57.4]\text{prop}[\mathbf{x}>57.4].
    3. Calculate prop[|𝐱50|<2.2]\text{prop}[|\mathbf{x}-50|<2.2].
    4. Calculate prop[|𝐱50|>3]\text{prop}[|\mathbf{x}-50|>3].

    Solution

    1. 0.4375
    2. 0.1875
    3. 0.1458333
    4. 0.7708333

  46. Question

    A standard eight-sided die was rolled many times, and the results were organized into the histogram shown below.

    plot of chunk unnamed-chunk-1


    1. Evaluate prop[𝐱<2.5]\text{prop}[\mathbf{x}<2.5].
    2. Evaluate prop[𝐱>3.5]\text{prop}[\mathbf{x}>3.5].
    3. Evaluate prop[|𝐱4.5|<2]\text{prop}[|\mathbf{x}-4.5|<2].
    4. Evaluate prop[|𝐱4|>2.5]\text{prop}[|\mathbf{x}-4|>2.5].
    5. Evaluate half-integer boundary bb such that prop[𝐱<b]=0.72\text{prop}[\mathbf{x}<b] = 0.72.
    6. Evaluate half-integer boundary bb such that prop[𝐱>b]=0.88\text{prop}[\mathbf{x}>b] = 0.88.
    7. Evaluate half-integer radius rr such that prop[|𝐱5|<r]=0.65\text{prop}[|\mathbf{x}-5|<r] = 0.65.
    8. Evaluate half-integer radius rr such that prop[|𝐱4|>r]=0.57\text{prop}[|\mathbf{x}-4|>r] = 0.57.

    Solution

    1. The answer is 0.18 because prop[𝐱<2.5]=0.18\text{prop}[\mathbf{x}<2.5]=0.18.
    2. The answer is 0.68 because prop[𝐱>3.5]=0.68\text{prop}[\mathbf{x}>3.5]=0.68.
    3. The answer is 0.54 because prop[|𝐱4.5|<2]=0.54\text{prop}[|\mathbf{x}-4.5|<2]=0.54.
    4. The answer is 0.4 because prop[|𝐱4|>2.5]=0.4\text{prop}[|\mathbf{x}-4|>2.5]=0.4.
    5. The answer is 6.5 because prop[𝐱<6.5]=0.72\text{prop}[\mathbf{x}<6.5]=0.72.
    6. The answer is 1.5 because prop[𝐱>1.5]=0.88\text{prop}[\mathbf{x}>1.5]=0.88.
    7. The answer is 2.5 because prop[|𝐱5|<2.5]=0.65\text{prop}[|\mathbf{x}-5|<2.5]=0.65.
    8. The answer is 1.5 because prop[|𝐱4|>1.5]=0.57\text{prop}[|\mathbf{x}-4|>1.5]=0.57.

  47. Question

    Three large samples were taken from three different populations. Their distributions are shown as histograms.

    plot of chunk unnamed-chunk-1

    1. Which sample has the smallest mean? X / Y / Z
    2. Which sample has the largest mean? X / Y / Z
    3. Which sample has the smallest standard deviation? X / Y / Z
    4. Which sample has the largest standard deviation? X / Y / Z


    Solution

    All three distributions look like they have similar bell shape, but their centers and spreads are all different.


    1. The distribution with its center sitting furthest left has the smallest mean. / . / .
    2. The distribution with its center sitting furthest right has the largest mean. / . / .
    3. The least wide distribution has the smallest standard deviation. It is this simple because all distributions have the same shape. / . / .
    4. The most wide distribution has the largest standard deviation. It is this simple because all distributions have the same shape. / . / .

  48. Question

    Three large samples were taken from three different populations. Their distributions are shown as histograms.

    plot of chunk unnamed-chunk-1

    1. Which sample has the smallest mean? X / Y / Z
    2. Which sample has the largest mean? X / Y / Z
    3. Which sample has the smallest standard deviation? X / Y / Z
    4. Which sample has the largest standard deviation? X / Y / Z


    Solution

    All three distributions look like they have similar ranges (widths), but different shapes. So, we will use the fact that for a given range, bell shape has the smallest standard deviation and bimodal has the largest standard deviation. This is because a bell shape has many measurements near the middle, whereas the bimodal shape has many measurements near the edges of its interval.


    1. The distribution with its center sitting furthest left has the smallest mean. / . / .
    2. The distribution with its center sitting furthest right has the largest mean. / . / .
    3. The bell shape has the smallest standard deviation. It is this simple because all the distributions have the same range, but the bell has a higher fraction near its mean. / . / .
    4. The bimodal shape has the largest standard deviation. It is this simple because all the distributions have the same range, but the bimodal shape has a higher fraction far from its mean. / . / .

  49. Question

    Five different populations were sampled, and the measurements were visualized as five boxplots. (Note: typical boxplots indicate outliers with dots. For simplicity, these boxplots include all outliers in the whiskers.)

    plot of chunk unnamed-chunk-1

    1. Which sample contains the largest value? V / W / X / Y / Z
    2. Which sample contains the smallest value? V / W / X / Y / Z
    3. Which sample has the largest median? V / W / X / Y / Z
    4. Which sample has the smallest median? V / W / X / Y / Z
    5. Which sample has the largest IQR? V / W / X / Y / Z
    6. Which sample has the smallest IQR? V / W / X / Y / Z


    Solution

    1. You find the sample with a whisker that goes furthest to the right.
    2. You find the sample with a whisker that goes furthest to the left.
    3. You find the sample with the median (thick line in middle of box) furthest to the right.
    4. You find the sample with the median (thick line in middle of box) furthest to the left.
    5. You find the sample with the widest box.
    6. You find the sample with the narrowest box.


  50. Question

    When measuring individuals from a population, we expect most measurements to be within the interval of typical measurements. We will define the interval of typical measurements (using interval notation): interval of typical measurements=(μ2σ,μ+2σ)\text{interval of typical measurements} = (\mu-2\sigma,\,\mu+2\sigma) In other words, we expect we expect most measurements to be between two bounds. lower bound of interval of typical measurements=μ2σ\text{lower bound of interval of typical measurements} = \mu-2\sigma upper bound of interval of typical measurements=μ+2σ\text{upper bound of interval of typical measurements} = \mu+2\sigma

    Actual Question

    A population of lizards has a mean length of μ=72.5\mu = 72.5 cm and a standard deviation of σ=4.0\sigma=4.0 cm. Determine the interval of typical measurements.


    1. Determine the lower bound of the interval of typical measurements.
    2. Determine the upper bound of the interval of typical measurements.

    Solution

    You need to use the formulas. Remember your order of operations!


    1. Use the formula. μ2σ=72.52(4)=64.5\mu-2\sigma = 72.5 - 2(4) = 64.5
    2. Use the formula. μ+2σ=72.5+2(4)=80.5\mu+2\sigma = 72.5 + 2(4) = 80.5

  51. Question

    We can visualize the “68-96-100” rule with a density curve. Notice the area of each region is shown, and you can estimate the areas by counting percentage boxes.

    plot of chunk unnamed-chunk-1

    Population XX has measurements that are normally distributed with mean μ=70\mu=70 and standard deviation σ=9\sigma=9. Use the empirical rule (68-96-100 rule) to answer the following questions.


    1. Evaluate prop[X<88]\text{prop}[X<88].
    2. Evaluate prop[X>79]\text{prop}[X>79].
    3. Evaluate prop[|X70|<18]\text{prop}[|X-70|<18].
    4. Determine boundary bb such that prop[X<b]=0.02\text{prop}[X<b]=0.02.
    5. Determine boundary bb such that prop[X>b]=0.5\text{prop}[X>b]=0.5.
    6. Determine radius rr such that prop[|X70|<r]=0.68\text{prop}[|X-70|<r]=0.68.

    Solution

    It helps to draw a diagram using the supplied mean (μ=70\mu=70) and standard deviation (σ=9\sigma=9).

    plot of chunk unnamed-chunk-2


    1. The answer is 0.98 because prop[X<88]=0.98\text{prop}[X<88]=0.98.
    2. The answer is 0.16 because prop[X>79]=0.16\text{prop}[X>79]=0.16.
    3. The answer is 0.96 because prop[|X70|<18]=0.96\text{prop}[|X-70|<18]=0.96.
    4. The answer is 52 because prop[X<52]=0.02\text{prop}[X<52]=0.02.
    5. The answer is 70 because prop[X>70]=0.5\text{prop}[X>70]=0.5.
    6. The answer is 9 because prop[|X70|<9]=0.68\text{prop}[|X-70|<9]=0.68.

  52. Question

    A standard score (zz) can be calculated from a measurement (xx), the population mean (μ\mu), and the population standard deviation (σ\sigma). z=xμσz = \frac{x-\mu}{\sigma}

    The following (normal) spinner has population mean μ=50\mu=50 and population standard deviation σ=7\sigma=7.

    plot of chunk unnamed-chunk-1


    1. If a spin has a measurement x=69.25x=69.25, what is the standard score (zz)?
    2. If a spin has a standard score z=2.68z=-2.68, what is the measurement (xx)?

    Solution

    1. Use the formula. z=xμσ=69.25507=2.75z = \frac{x-\mu}{\sigma} = \frac{69.25-50}{7} = 2.75
    2. Use the formula. x=μ+zσ=50+(2.68)(7)=31.24x = \mu+z\sigma = 50+(-2.68)(7) = 31.24

  53. Question

    A sample of size n=400n=400 was taken from an unknown population.

     3.13, 8.95, 8.53, 2.79, 7.41, 8.11, 9.97, 5.88, 8.32, 8.19,
     1.80, 9.17, 8.28, 9.98, 5.46, 7.92, 9.84, 9.76, 9.83, 7.19,
     9.42, 5.17, 9.64, 8.93, 9.33, 8.15, 5.50, 9.77, 4.66, 9.29,
     6.11, 7.85, 4.25, 1.06, 9.99, 8.34, 7.05, 3.47, 9.18, 8.54,
     6.33, 8.51, 8.59, 6.33, 6.69, 9.33, 8.95, 9.62, 6.29, 1.10,
     4.18, 7.61, 7.76, 9.96, 7.77, 5.42, 9.01, 8.22, 9.09, 6.29,
     6.85, 7.57, 9.77, 2.63, 5.30, 9.24, 6.38, 5.54, 8.23, 3.57,
     8.88, 6.72, 7.92, 9.03, 9.54, 6.28, 0.68, 7.64, 9.43, 2.78,
     9.67, 9.90, 5.71, 2.13, 9.20, 8.59, 9.09, 8.91, 6.27, 9.68,
     7.65, 7.45, 9.88,10.00, 2.87, 8.64, 5.20, 9.97, 9.45, 5.26,
     4.88, 8.99, 8.51, 8.16, 6.26, 9.92, 6.13, 6.89, 4.66, 9.73,
     8.64, 3.05, 7.90, 5.81, 6.18, 7.78, 5.11, 3.55, 3.47, 8.65,
     7.12, 4.50,10.00, 5.14, 4.71, 3.44, 8.41, 8.90, 8.41, 4.66,
     9.46, 8.49, 9.89, 9.49, 9.85, 8.81, 9.88, 5.90, 9.89, 8.98,
     8.86, 2.96, 2.73, 8.74, 7.19, 9.91, 9.16, 5.76, 5.58, 5.05,
     5.55, 9.85, 7.14, 7.74, 9.22, 2.38, 9.38, 3.13, 2.59, 7.61,
     5.49, 9.61, 9.43, 7.09, 4.31, 9.02, 4.04, 9.21, 8.86, 4.00,
     7.73, 2.48, 8.30, 8.46, 8.77, 9.91, 4.55, 8.11, 4.93, 5.33,
     6.33, 7.84, 7.44, 8.33, 7.86, 2.12, 4.73, 1.39, 8.17, 9.50,
     8.79, 9.55, 6.41, 3.58, 8.60, 7.98, 6.95, 7.91,10.00, 9.30,
     8.50, 3.22, 2.43, 8.57, 9.23, 4.02, 7.51, 8.59, 9.97, 7.40,
     7.47, 9.92, 7.83, 9.90, 9.69, 9.99, 8.18, 2.74, 7.30, 9.81,
     5.64, 3.90, 9.62, 4.12, 9.35, 3.17, 2.37, 2.32, 9.61, 9.99,
     7.98, 9.80, 5.58, 2.69, 4.08, 5.90, 5.93, 9.64, 1.70, 6.93,
     8.03, 9.63, 9.19, 1.34, 9.13, 3.55, 4.62, 1.98, 2.45, 8.35,
     7.36, 8.12, 9.98, 7.03, 8.07, 8.31, 1.65, 7.44, 7.47, 9.16,
     1.08, 4.48, 6.76, 5.39, 9.84, 6.44, 6.91, 5.93, 6.66, 6.13,
     8.63, 9.97, 6.06, 8.91, 5.15, 3.50, 9.93, 4.64, 9.97, 9.71,
     5.61, 2.08, 6.91, 9.82, 5.93, 4.90, 8.17, 7.26, 9.38,10.00,
     9.98, 7.18, 9.65, 4.25, 9.68,10.00, 9.34, 8.23, 9.93, 9.88,
     9.71, 2.52, 9.30, 5.83, 4.73, 7.59, 3.31, 2.88, 3.29, 9.36,
     9.25, 5.06, 4.99, 9.51, 9.42, 2.12, 9.71, 8.97, 9.62, 9.84,
     4.52, 6.58, 6.99, 1.28, 0.63, 9.78, 8.10, 8.96, 6.83, 9.52,
     3.23, 8.92, 9.95, 9.93, 2.78, 8.98, 9.37, 8.21, 8.64, 8.33,
     2.28, 9.55, 9.55, 7.29, 7.72, 8.02, 5.41, 8.06, 0.93, 9.07,
     6.25, 3.53, 8.79, 5.95, 5.55, 8.93, 9.52, 7.61, 9.82, 3.24,
     2.70, 5.53, 8.79, 7.90, 5.51, 8.72, 5.96, 7.66, 8.68, 9.99,
     9.82, 4.69, 9.36, 2.51, 6.62, 8.78, 3.27, 6.43, 9.19, 7.31,
     7.72, 8.66, 9.99, 9.46, 5.48, 9.39, 9.95, 4.96, 8.71, 8.91,
     1.47, 5.96, 2.92, 9.13, 8.89, 9.86, 8.80, 8.85, 6.93, 9.94

    You can download the data as a CSV. Determine which histogram visualizes the data, and describe the shape of the data.

    plot of chunk unnamed-chunk-2



    Solution

    You should make a histogram. This is easy in R.

    x = read.csv("make_hist.csv")$x
    hist(x)

    plot of chunk unnamed-chunk-3

    Using a spreadsheet is way more work. But you could just make a frequency distribution and decide from there.



  54. Question

    In a deck of strange cards, there are 426 cards. Each card has an image and a color. The amounts are shown in the table below and can be downloaded as a csv.

    plot of chunk unnamed-chunk-1

    (Answers can be rounded to nearest hundredth.)


    1. A house is more likely than a pig to be blue. / A pig is more likely than a house to be blue.
    2. What is the probability a random card is a cat?
    3. What is the probability a random card is a bike given it is blue?
    4. What is the probability a random card is yellow given it is a house?
    5. What is the probability a random card is either a cat or blue (or both)?
    6. What is the probability a random card is both a house and blue?
    7. What is the probability a random card is blue?

    Solution

    The key logical terms are “and”, “or”, and “given”. Notice that I am using “given” as a shorter version of “under the condition”.


    1. [blue given house]=0.12\mathbb{P}[\text{blue given house}] = 0.12 and [blue given pig]=0.48\mathbb{P}[\text{blue given pig}] = 0.48, so a pig is more likely to be blue than a house is. / .
    2. [cat]=86426=0.202\mathbb{P}[\text{cat}]=\frac{86}{426}=0.202
    3. [bike given blue]=50146=0.342\mathbb{P}[\text{bike given blue}]=\frac{50}{146}=0.342
    4. [yellow given house]=3083=0.361\mathbb{P}[\text{yellow given house}]=\frac{30}{83}=0.361
    5. [cat or blue]=86+14626426=0.484\mathbb{P}[\text{cat or blue}]=\frac{86+146-26}{426}=0.484
    6. [house and blue]=10426=0.0235\mathbb{P}[\text{house and blue}]=\frac{10}{426}=0.0235
    7. [blue]=146426=0.343\mathbb{P}[\text{blue}]=\frac{146}{426}=0.343

  55. Question

    A spinner was constructed:

    plot of chunk unnamed-chunk-1

    The spinner’s probability distribution is shown below.

    xx [x]\mathbb{P}[x]
    10 0.21
    12 0.09
    13 0.49
    15 0.15
    20 0.06

    It can also be downloaded as a csv.


    1. What is the probability of spinning 13? In other words, what is [X=13]\mathbb{P}[X=13]?
    2. What is the probability of spinning 10 or 15? In other words, what is [X=10 or X=15]\mathbb{P}[X=10 ~\text{ or }~ X=15]?
    3. If spinning twice, what is the probability of first spinning 10 and then spinning 15? In other words, what is [X1=10 and X2=15]\mathbb{P}[X_1=10 ~\text{ and }~ X_2=15]?
    4. What is the probability of spinning at least 13? In other words, what is [X13]\mathbb{P}[X\ge 13]?
    5. Determine the mean of the probability distribution by using μ=x[x]\mu = \sum x\cdot \mathbb{P}[x].
    6. Determine the standard deviation of the probability distribution by using σ=(xμ)2[x]\sigma = \sqrt{\sum (x-\mu)^2 \cdot \mathbb{P}[x]}.

    Solution

    Make a table (for parts mean and standard deviation).

    xx [x]\mathbb{P}[x] x[x]x\cdot\mathbb{P}[x] xμx-\mu (xμ)2(x-\mu)^2 (xμ)2[x](x-\mu)^2 \cdot \mathbb{P}[x]
    10 0.21 2.1 -3 9 1.89
    12 0.09 1.08 -1 1 0.09
    13 0.49 6.37 0 0 0
    15 0.15 2.25 2 4 0.6
    20 0.06 1.2 7 49 2.94
    ========= ========= ========= ========= ========= =========
    x[x)]=13\sum x \cdot \mathbb{P}[x)] = 13 σ2=5.52\sigma^2 = 5.52
    μ=13\mu = 13 σ=2.349468\sigma = 2.349468

    1. 0.49
    2. 0.21+0.150.21 + 0.15 = 0.36
    3. 0.21×0.150.21 \times 0.15 = 0.0315
    4. 0.7
    5. μ=13\mu = 13
    6. σ=2.349468\sigma = 2.349468

  56. Question

    A pizza shop has 14 different toppings available. You will choose 4 different toppings for your pizza. How many possibilities exist?


    Solution

    This scenario describes a combinations problem (order of selection does not matter). We are considering the subsets of size 4 from a set of size 14.

    nCr=n!(nr)!r!n=14r=414C4=14!(144)!4!=14!10!4!=141312114321=1001\begin{aligned} {_nC_r} &= \frac{n!}{(n-r)! \cdot r!} \\\\ n &= 14 \\\\ r &= 4 \\\\ {_{14}C_{4}} &= \frac{14!}{(14-4)!\cdot 4! } \\\\ &= \frac{14!}{10! \cdot 4!} \\\\ &= \frac{ 14\cdot13\cdot12\cdot11}{ 4\cdot3\cdot2\cdot1} \\\\ &= 1001 \end{aligned}

    Remember, we care about combinations because they represent all the ways we can select rr 1s and nrn-r 0s. So, in this case, 14C4=1001{_{14}C_{4}} = 1001 tells us there are 1001 ways of selecting 10 0s and 4 1s. (Think of the 1s as the toppings that are selected and 0s as the toppings NOT selected.)

    So, if you had a lot of time, you could list out all possibilities:

    Count Possibility
    1 0 1 1 0 0 0 0 1 0 0 0 1 0 0
    2 0 1 0 0 0 1 1 0 0 0 1 0 0 0
    3 0 0 0 0 0 0 1 0 0 1 0 1 1 0
    4 0 1 0 1 0 0 0 0 0 1 0 0 0 1
    5 0 0 1 0 0 0 0 0 0 1 0 1 0 1
    \vdots \vdots
    997 0 0 0 0 0 1 1 0 1 0 0 0 1 0
    998 0 0 0 0 1 0 0 1 0 1 0 0 0 1
    999 1 0 0 1 0 0 1 1 0 0 0 0 0 0
    1000 0 1 0 0 1 1 0 0 1 0 0 0 0 0
    1001 0 0 0 0 0 0 0 1 1 1 1 0 0 0

    Of course, you’d want to be more systematic than that…


  57. Question

    Joe is shopping for shirts. Joe likes 17 of the shirts, but will only buy 4 of them. How many possibilities exist?


    Solution

    This scenario describes a combinations problem (order does not matter). We are considering the subsets of size 4 from a set of size 17.

    nCr=n!(nr)!r!n=17r=417C4=17!(174)!4!=17!13!4!=171615144321=2380\begin{aligned} {_nC_r} &= \frac{n!}{(n-r)! \cdot r!} \\\\ n &= 17 \\\\ r &= 4 \\\\ {_{17}C_{4}} &= \frac{17!}{(17-4)!\cdot 4! } \\\\ &= \frac{17!}{13! \cdot 4!} \\\\ &= \frac{ 17\cdot16\cdot15\cdot14}{ 4\cdot3\cdot2\cdot1} \\\\ &= 2380 \end{aligned}


  58. Question

    A company needs to select a CFO, a president, and a secretary. Each position will be held by a different person. The company is considering the same pool of 24 applicants for each position. How many possibilities exist?


    Solution

    This scenario describes a permutations problem (order matters). We are considering the nonrepeating sequences of size 3 from a set of size 24.

    nPr=n!(nr)!n=24r=324P3=24!(243)!=24!21!=242322=12144\begin{aligned} {_nP_r} &= \frac{n!}{(n-r)!} \\\\ n &= 24 \\\\ r &= 3 \\\\ {_{24}P_{3}} &= \frac{24!}{(24-3)! } \\\\ &= \frac{24!}{21!} \\\\ &= 24\cdot23\cdot22 \\\\ &= 12144 \end{aligned}

    If you had a lot of time, you could list out all possibilities (using 1 for a CFO, 2 for a president…):

    Count Possibility
    1 0 0 0 0 1 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 0 0 0 0
    2 0 3 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    3 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1
    4 0 3 0 0 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0
    5 0 0 0 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 2 0 0 0 0
    \vdots \vdots
    12140 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 0 0 0 0 0
    12141 1 2 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0
    12142 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 1 0 0
    12143 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0
    12144 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 2

    Of course, you’d want to be more systematic than that.


  59. Question

    A team has 20 players. The coach will give out 5 different prizes to different players. How many ways could the coach do this?


    Solution

    This scenario describes a permutations problem (order matters). We are considering the nonrepeating sequences of size 5 from a set of size 20.

    nPr=n!(nr)!n=20r=520P5=20!(205)!=20!15!=2019181716=1860480\begin{aligned} {_nP_r} &= \frac{n!}{(n-r)!} \\\\ n &= 20 \\\\ r &= 5 \\\\ {_{20}P_{5}} &= \frac{20!}{(20-5)! } \\\\ &= \frac{20!}{15!} \\\\ &= 20\cdot19\cdot18\cdot17\cdot16 \\\\ &= 1860480 \end{aligned}


  60. Question

    In some situation, each trial has 0.61 probability of success. There will be 10 trials. (Thus the number of successes will follow a binomial distribution.)


    1. What is the probability of getting exactly 3 successes? In other words, determine [X=3]\mathbb{P}[X = 3].
    2. What is the probability of getting exactly 8 successes? In other words, determine [X=8]\mathbb{P}[X = 8].
    3. What is the probability of getting more than 8 successes? In other words, determine [X>8]\mathbb{P}[X > 8].
    4. What is the probability of getting at least 8 successes? In other words, determine [X8]\mathbb{P}[X \ge 8].
    5. What is the probability of getting less than 8 successes? In other words, determine [X<8]\mathbb{P}[X < 8].
    6. What is the probability of getting at most 8 successes? In other words, determine [X8]\mathbb{P}[X \le 8].
    7. Determine the mean number of successes.
    8. Determine the standard deviation of successes.

    Solution

    This is a binomial distribution, so use the appropriate formulas. [X=x]=nCxpx(1p)nx\mathbb{P}[X=x] ~=~ _n\text{C}_x \cdot p^x(1-p)^{n-x} where pp is the probability of success on each trial, xx is a specific number of successes, nn is the number of trials, and CC is the combinations operator (so that nCx=n!(nx)!x!{_n}C_x = \frac{n!}{(n-x)! \, x!}). Some people prefer to also use qq as the probability of failure, such that q=1pq=1-p. [X=x]=nCxpxqnx\mathbb{P}[X=x] ~=~ _n\text{C}_x \cdot p^xq^{n-x}

    You will also need to add mutually exclusive probabilities (when multiple xx-values satisfy the probability’s condition). It is also helpful to be aware of the complement rule.


    1. Pr(X=3)=(10C3)(0.613)(0.397)=0.0373786\text{Pr}(X=3) ~=~ \left({_{10}}C_{3}\right)\left(0.61^{3}\right) \left(0.39^{7}\right) ~=~ 0.0373786
    2. Pr(X=8)=(10C8)(0.618)(0.392)=0.1312141\text{Pr}(X=8) ~=~ \left({_{10}}C_{8}\right)\left(0.61^{8}\right)\left(0.39^{2}\right) ~=~ 0.1312141
    3. Pr(X>8)=0.0527406\text{Pr}(X > 8) ~=~ 0.0527406
    4. Pr(X8)=0.1839547\text{Pr}(X \ge 8) ~=~ 0.1839547
    5. Pr(X<8)=0.8160453\text{Pr}(X < 8) ~=~ 0.8160453
    6. Pr(X8)=0.9472594\text{Pr}(X \le 8) ~=~ 0.9472594
    7. Because this is a binomial distribution, μ=np\mu = np, so μ=6.1\mu = 6.1
    8. Because this is a binomial distribution, σ=npq\sigma = \sqrt{npq}, so σ=1.5424007\sigma = 1.5424007

  61. Question

    Bob has a 0.69 chance of winning a game. If Bob wins, he has a 0.48 chance of being happy after the game. If Bob loses, he has a 0.21 chance of being happy after the game.

    After the game, you notice Bob is happy. What is the probability that Bob won his game? (Do not answer as a percentage; answer as a decimal.)


    Solution

    Use the definition of conditional probability. P(win given happy)=P(win and happy)P(happy)P(\text{win given happy}) = \frac{P(\text{win and happy})}{P(\text{happy})} We can first determine all the joint probabilities. P(A and B)=P(A)P(B given A)P(A\text{ and }B) = P(A)\cdot P(B\text{ given } A) P(win and happy)=0.69×0.48=0.3312P(win and sad)=0.69×(10.48)=0.3588P(lose and happy)=(10.69)×0.21=0.0651P(lose and sad)=(10.69)×(10.21)=0.2449\begin{aligned} P(\text{win and happy}) &= 0.69\times0.48 = 0.3312\\ P(\text{win and sad}) &= 0.69\times(1-0.48) = 0.3588\\ P(\text{lose and happy}) &= (1-0.69)\times0.21 = 0.0651\\ P(\text{lose and sad}) &= (1-0.69)\times(1-0.21) = 0.2449\\ \end{aligned}

    Notice there are two disjoint ways Bob could be happy. P(happy)=P([win and happy] OR [lose and happy])=P(win and happy)+P(lose and happy)=0.3312+0.0651=0.3963\begin{aligned} P(\text{happy}) &= P\left(\text{[win and happy] OR [lose and happy]}\right) \\ &= P(\text{win and happy}) + P(\text{lose and happy}) \\ &= 0.3312+0.0651 \\ &= 0.3963 \end{aligned} So, back to the conditional probability. P(win given happy)=0.33120.3963=0.836P(\text{win given happy}) = \frac{0.3312}{0.3963} = 0.836


  62. Question

    Cindy has two games today. Each game she will either win or lose. She has a 0.34 chance of winning the first game and a 0.79 chance of winning the second game.


    1. What is the chance Cindy wins both games?
    2. What is the chance Cindy loses both games?
    3. What is the chance Cindy wins once and loses once (in either order)?

    Solution

    P(WW)=0.34×0.79=0.2686P(WW) = 0.34 \times 0.79 = 0.2686 P(LL)=0.66×0.21=0.1386P(LL) = 0.66 \times 0.21 = 0.1386 P(WL)=0.34×0.21=0.0714P(WL) = 0.34\times 0.21 = 0.0714 P(LW)=0.66×0.79=0.5214P(LW) = 0.66\times 0.79 = 0.5214


    1. Win,win: 0.34 * 0.79 = 0.2686
    2. Lose,lose: (1-0.34) * (1-0.79) = 0.1386
    3. WL or LW: 1-0.2686-0.1386 = 0.5928

  63. Question

    Determine the probability that the standard normal variable is less than -0.87. In other words, evaluate P(Z<0.87)P(Z < -0.87).


    Solution

    The numbers that satisfy Z<0.87Z<-0.87 are on the left side of a number line (toward -\infty). The probability equals a left area under the density curve.

    plot of chunk unnamed-chunk-1

    By using the z-table, we find the appropriate probability. P(Z<0.87)=0.1922P(Z<-0.87)=0.1922

    It might help to visualize with a spinner:

    plot of chunk unnamed-chunk-2

    Using a spreadsheet:

    =NORM.DIST(-0.87,0,1,TRUE)

    Using R:

    pnorm(-0.87)

  64. Question

    Determine the probability that the standard normal variable is more than 0.05. In other words, evaluate P(Z>0.05)P(Z > 0.05).


    Solution

    First, you need to identify that we are looking for a right area. This is because large values of ZZ satisfy Z>0.05Z>0.05.

    plot of chunk unnamed-chunk-1

    By using the z-table, we find the left area (even though we eventually want the right area).

    P(Z<0.05)=0.5199P(Z<0.05)=0.5199

    We use the rule of compliments to determine the right area.

    P(Z>0.05)=1P(Z<0.05)=10.5199=0.4801\begin{aligned} P(Z > 0.05) &= 1-P(Z < 0.05) \\ &= 1-0.5199 \\ &= 0.4801 \end{aligned}

    Method 2: You might also recognize that the normal distribution is symmetric. Thus,

    P(Z>0.05)=P(Z<0.05)=0.4801\begin{aligned} P(Z > 0.05) &= P(Z < -0.05) \\ &= 0.4801 \end{aligned}

    Method 3: It often helps to draw a picture.

    plot of chunk unnamed-chunk-2

    It might help to visualize with a spinner:

    plot of chunk unnamed-chunk-3

    Using a spreadsheet:

    =1-NORM.DIST(0.05,0,1,TRUE)

    Using R:

    1-pnorm(0.05)

  65. Question

    Determine the probability that the absolute standard normal variable is less than 0.89. In other words, evaluate P(|Z|<0.89)P\left(|Z| < 0.89\right).


    Solution

    First, you need to identify that we are looking for a central area. This is because zz-scores near 0 satisfy |Z|<0.89|Z|<0.89, while zz-scores far from 0 (either positive or negative) do not satisfy the inequality.

    Start with a sketch.

    plot of chunk unnamed-chunk-1

    Remember, the entire area is always 1. Most z-tables only provide left areas, so three methods are shown to get the central area from a left-area (cumulative probability) table.

    Method 1: First find the area of the left tail. P(Z<0.89)=0.1867P(Z<-0.89) = 0.1867 Recognize normal distributions are symmetric, so we also know the area of the right tail. P(Z>0.89)=0.1867P(Z>0.89) = 0.1867 The three areas add to 1. P(|Z|<0.89)=1P(Z<0.89)P(Z>0.89)=10.18670.1867=0.6266\begin{aligned} P\left(|Z|<0.89\right) &= 1-P(Z<-0.89)-P(Z>0.89) \\ &= 1 - 0.1867 - 0.1867 \\ &= 0.6266 \end{aligned}

    This technique can be summarized with the following formula: P(|Z|<z)=12P(Z<z)P\left(|Z|<z\right) = 1-2\cdot P(Z<-z) assuming z>0z>0.

    So, in our case (when z=0.89z=0.89), P(|Z|<0.89)=12P(Z<0.89)=120.1867=0.6266\begin{aligned} P\left(|Z|<0.89\right) &= 1-2\cdot P(Z<-0.89) \\ &= 1-2\cdot0.1867 \\ &= 0.6266 \end{aligned}

    This method is shown graphically:

    plot of chunk unnamed-chunk-2

    Method 2: You can also find half of the central area and then double it. P(|Z|<z)=2[P(Z<z)0.5]=2[0.81330.5]=2[0.3133]=0.6266\begin{aligned} P\left(|Z|<z\right) &= 2\cdot\left[ P(Z<z)-0.5\right] \\ &= 2\cdot\left[ 0.8133-0.5\right] \\ &= 2\cdot\left[ 0.3133\right] \\ &= 0.6266 \end{aligned}

    This method is shown graphically:

    plot of chunk unnamed-chunk-3

    Method 3: You can also calculate the central area with a difference of two left areas.

    P(|Z|<0.89)=P(Z<0.89)P(Z<0.89)=0.81330.1867=0.6266\begin{aligned} P\left(|Z|<0.89\right) &= P(Z<0.89)-P(Z<-0.89)\\ &= 0.8133-0.1867 \\ &= 0.6266 \end{aligned}

    plot of chunk unnamed-chunk-4

    It might be helpful to visualize with a spinner.

    plot of chunk unnamed-chunk-5

    In a spreadsheet, you could use the NORM.DIST() function.

    =NORM.DIST(0.89,0,1,TRUE) - NORM.DIST(-0.89,0,1,TRUE)

    In R, you could use the pnorm function.

    pnorm(0.89) - pnorm(-0.89)

  66. Question

    Determine the probability that the absolute standard normal variable is more than 0.56. In other words, evaluate P(|Z|>0.56)P\left(|Z| > 0.56\right).


    Solution

    First, you need to identify that we are looking for a two-tail area (the sum of the left and right tails). This is because zz-scores far from 0 satisfy |Z|>0.56|Z|>0.56, while zz-scores near 0 do not satisfy the inequality.

    Start with a sketch.

    plot of chunk unnamed-chunk-1

    Remember, the entire area is always 1. Most z-tables only provides left areas, so three methods are shown to get the two-tail area from a left-area (cumulative probability) table.

    Method 1: First find the area of the left tail. P(Z<0.56)=0.2877P(Z<-0.56) = 0.2877 Recognize normal distributions are symmetric, so we also know the area of the right tail. P(Z>0.56)=0.2877P(Z>0.56) = 0.2877 The two areas add to our desired two-tail area. P(|Z|>0.56)=P(Z<0.56)+P(Z>0.56)=0.2877+0.2877=0.5754\begin{aligned} P\left(|Z|>0.56\right) &= P(Z<-0.56)+P(Z>0.56) \\ &= 0.2877 + 0.2877 \\ &= 0.5754 \end{aligned}

    This technique can be summarized with the following formula: P(|Z|>z)=2P(Z<z)P\left(|Z|>z\right) = 2\cdot P(Z<-z) assuming z>0z>0.

    So, in our case (when z=0.56z=0.56), P(|Z|<0.56)=2P(Z<0.56)=20.2877=0.5754\begin{aligned} P\left(|Z|<0.56\right) &= 2\cdot P(Z<-0.56) \\ &= 2\cdot0.2877 \\ &= 0.5754 \end{aligned}

    This method is shown graphically:

    plot of chunk unnamed-chunk-2

    Notice we need to add both tails.

    Method 2: You can achieve the same result by using the following formula: P(|Z|>z)=2[1P(Z<z)]P\left(|Z|>z\right) = 2\cdot\left[1-P(Z<z) \right] So, P(|Z|>0.56)=2[1P(Z<0.56)]=2[10.7123]=20.2877=0.5754\begin{aligned} P\left(|Z|>0.56\right) &= 2\cdot\left[1-P(Z<0.56) \right] \\ &= 2\cdot\left[1-0.7123\right] \\ &= 2\cdot0.2877\\ &= 0.5754 \end{aligned} plot of chunk unnamed-chunk-3

    It might be helpful to visualize with a spinner.

    plot of chunk unnamed-chunk-4

    In a spreadsheet, you could use the NORM.DIST() function.

    =2*NORM.DIST(-0.56,0,1,TRUE)

    In R, you could use the pnorm function.

    2*pnorm(-0.56)

  67. Question

    Determine the probability that the standard normal variable is between -1.35 and -0.16. In other words, evaluate P(1.35<Z<0.16)P(-1.35 < Z < -0.16).


    Solution

    Start with a sketch.

    plot of chunk unnamed-chunk-1

    We take a difference of areas. P(1.35<Z<0.16)=P(Z<0.16)P(Z<1.35)=0.43640.0885=0.3479\begin{aligned} P(-1.35<Z<-0.16) &= P(Z<-0.16) - P(Z<-1.35) \\ &= 0.4364 - 0.0885 \\ &= 0.3479 \end{aligned}

    plot of chunk unnamed-chunk-2

    plot of chunk unnamed-chunk-3

    In a spreadsheet, you could use the NORM.DIST() function.

    =NORM.DIST(-0.16,0,1,TRUE) - NORM.DIST(-1.35,0,1,TRUE)

    In R, you could use the pnorm function.

    pnorm(-0.16) - pnorm(-1.35)

  68. Question

    Determine zz such that P(Z<z)=0.41P(Z<z)=0.41. In other words, what zz-score is greater than 4141% of standard normal values? (Answers within 0.01 from the correct value will be marked correct.)


    Solution

    Start with a sketch. Leftward numbers (toward -\infty) will be less than our boundary zz, so we shade a left region with area 0.41.

    plot of chunk unnamed-chunk-1

    You should go to your zz-table and find the zz-score with the left area closest to 0.41.

    zz P(Z<z)P(Z<z)
    -0.25 0.4013
    -0.24 0.4052
    -0.23 0.409
    -0.22 0.4129
    -0.21 0.4168
    -0.2 0.4207

    It turns out the exact answer is z=0.227545z=-0.227545, which could be found by using an inverse normal function. On a spreadsheet:

    =Norm.Inv(0.41,0,1)

    Using R:

    qnorm(0.41)

    But, the zz-table is accurate enough, so I will accept either -0.23 or -0.22 (anything within 0.01 of -0.227545).

    You might find it helpful to visualize with a spinner.

    plot of chunk unnamed-chunk-3


  69. Question

    Determine zz such that P(Z>z)=0.94P(Z>z)=0.94. In other words, what zz-score is less than 9494% of standard normal values? (Answers within 0.01 from the correct value will be marked correct.)


    Solution

    Start with a sketch. Rightward numbers (toward ++\infty) will be more than our boundary zz, so we shade a rightward region with area 0.94.

    plot of chunk unnamed-chunk-1

    You should first find the left area. P(Z<z)=1P(Z>z)=10.94=0.06\begin{aligned} P(Z<z) &= 1-P(Z>z) \\ &= 1-0.94 \\ &= 0.06 \end{aligned}

    plot of chunk unnamed-chunk-2

    You should go to your zz-table and find the zz-score with the left area closest to 0.06.

    zz P(Z<z)P(Z<z)
    -1.58 0.0571
    -1.57 0.0582
    -1.56 0.0594
    -1.55 0.0606
    -1.54 0.0618
    -1.53 0.063

    It turns out the exact answer is z=1.5547736z=-1.5547736, which could be found by using an inverse normal function. On a spreadsheet:

    =Norm.Inv(0.06,0,1)

    Using R:

    rightarea = 0.94
    leftarea = 1-rightarea
    qnorm( leftarea )

    But, the zz-table is accurate enough, so I will accept either -1.56 or -1.55 (anything within 0.01 of -1.5547736).

    You might find it helpful to visualize with a spinner.

    plot of chunk unnamed-chunk-4


  70. Question

    Determine zz such that P(|Z|<z)=0.54P(|Z|<z)=0.54. In other words, how far from 0 should boundaries be set such that 54% of standard normal values are between those boundaries? (Answers within 0.01 from the correct value will be marked correct.)


    Solution

    Start with a sketch.

    plot of chunk unnamed-chunk-1

    Method 1: Determine the area of each tail. Both tails have the same area, and all three areas add to 1. Thus, P(Z>z)=10.542=0.23P(Z>z) = \frac{1-0.54}{2} = 0.23 plot of chunk unnamed-chunk-2

    P(Z<z)=0.23+0.54=0.77\begin{aligned} P(Z<z) &= 0.23+0.54 \\ &= 0.77 \end{aligned}

    You should go to your zz-table and find the zz-score with the left area closest to 0.77.

    zz P(Z<z)P(Z<z)
    0.71 0.7611
    0.72 0.7642
    0.73 0.7673
    0.74 0.7704
    0.75 0.7734
    0.76 0.7764

    It turns out the exact answer is z=0.7388468z=0.7388468, which could be found by using an inverse normal function. On a spreadsheet:

    =Norm.Inv(0.77,0,1)

    Using R:

    centralarea = 0.54
    leftarea = (1-centralarea)/2 + centralarea
    qnorm( leftarea )

    But, because we are using the zz-table, I will accept either 0.73 or 0.74. (Or, really anything within 0.01 of 0.7388468.)

    Method 2: Another way to get 0.77 is by adding half of 0.54 to 0.5. P(Z<z)=0.542+0.5=0.27+0.5=0.77P(Z<z) ~=~ \frac{0.54}{2}+0.5~=~ 0.27+0.5 ~=~ 0.77

    plot of chunk unnamed-chunk-4

    Then, use the table. Or, you could use R:

    centralarea = 0.54
    leftarea = 0.5 + centralarea/2
    qnorm( leftarea )

    It might be helpful to visualize with a spinner.

    plot of chunk unnamed-chunk-5


  71. Question

    Determine zz such that P(|Z|>z)=0.2P(|Z|>z)=0.2. In other words, how far from 0 should boundaries be set such that 20% of standard normal values are outside those boundaries? (Answers within 0.01 from the correct value will be marked correct.)


    Solution

    Start with a sketch. The total two-tail area is 0.2, so each tail has half that area.

    plot of chunk unnamed-chunk-1

    Method 1: Determine the area of each tail and the center. Both tails have the same area, and all three areas add to 1. Thus, P(Z>z)=0.22=0.1P(Z>z) = \frac{0.2}{2} = 0.1

    P(Z<z)=0.22=0.1P(Z<-z) = \frac{0.2}{2} = 0.1

    P(|Z|<z)=1P(|Z|>z)=10.2=0.8\begin{aligned} P(|Z|<z) &= 1-P(|Z|>z) \\ &= 1-0.2 \\ &= 0.8 \end{aligned} plot of chunk unnamed-chunk-2

    Find the left area.

    P(Z<z)=0.1+0.8=0.9\begin{aligned} P(Z<z) &= 0.1+0.8 \\ &= 0.9 \end{aligned}

    You should go to your zz-table and find the zz-score with the left area closest to 0.9.

    zz P(Z<z)P(Z<z)
    1.26 0.8962
    1.27 0.898
    1.28 0.8997
    1.29 0.9015
    1.3 0.9032
    1.31 0.9049

    It turns out the exact answer is z=1.2815516z=1.2815516, which could be found by using an inverse normal function. On a spreadsheet:

    =Norm.Inv(0.9,0,1)

    Using R:

    twotailarea = 0.2
    onetailarea = twotailarea/2
    centralarea = 1-twotailarea
    leftarea = onetailarea + centralarea
    qnorm( leftarea )

    Method 2: Another way to get 0.9 is by subtracting half of 0.2 from 1. P(Z<z)=10.22=10.1=0.9P(Z<z) ~=~ 1-\frac{0.2}{2}~=~ 1-0.1 ~=~ 0.9

    plot of chunk unnamed-chunk-4

    Then, use the table. Or, R:

    twotailarea = 0.2
    leftarea = 1 - twotailarea/2
    qnorm( leftarea )

    You might find a spinner visualization useful.

    plot of chunk unnamed-chunk-5


  72. Question

    A farm produces 4 types of fruit: kiwis, plums, apricots, and apples. The fruits’ masses follow normal distributions, with population parameters dependent on the type of fruit.

    _ Type of fruit _ _ Mean mass (g) _ _ Standard deviation of mass (g) _
    kiwis 95 8
    plums 105 8
    apricots 43 4
    apples 214 12

    One specimen of each type is weighed. The results are shown below.

    _ Specimen type _ _ Mass of specimen (g) _
    kiwi 103.2
    plum 100.8
    apricot 42
    apple 203.9

    The population parameters and specimen masses can be downloaded as a csv.

    For each measurement, determine the standard score and the cumulative probability. Then determine which specimen is most unusually large, most unusually small, most typically sized, and most unusually sized.


    1. Calculate a zz score for the kiwi specimen. (Round to the nearest hundredth.)
    2. What proportion of kiwis have less mass than the kiwi specimen?
    3. Calculate a zz score for the plum specimen. (Round to the nearest hundredth.)
    4. What proportion of plums have less mass than the plum specimen?
    5. Calculate a zz score for the apricot specimen. (Round to the nearest hundredth.)
    6. What proportion of apricots have less mass than the apricot specimen?
    7. Calculate a zz score for the apple specimen. (Round to the nearest hundredth.)
    8. What proportion of apples have less mass than the apple specimen?
    9. The kiwi specimen is most unusually large. / The plum specimen is most unusually large. / The apricot specimen is most unusually large. / The apple specimen is most unusually large.
    10. The kiwi specimen is most unusually small. / The plum specimen is most unusually small. / The apricot specimen is most unusually small. / The apple specimen is most unusually small.
    11. The kiwi specimen is most typically sized. / The plum specimen is most typically sized. / The apricot specimen is most typically sized. / The apple specimen is most typically sized.
    12. The kiwi specimen is most unusually sized. / The plum specimen is most unusually sized. / The apricot specimen is most unusually sized. / The apple specimen is most unusually sized.

    Solution

    The formula to determine the zz-score of a measurement is the ratio with numerator the difference between measurement and population mean and denominator the population standard deviation.

    z=xμσz = \frac{x-\mu}{\sigma}

    The highest zz-score (furthest right on number line) corresponds to the most unusually large measurement.

    The smallest zz-score (furthest left on number line) corresponds to the most unusually small measurement.

    The smallest absolute zz-score corresponds to the most usually sized measurement.

    The largest absolute zz-score corresponds to the most unusually sized measurement.


    1. We use the formula for calculating the kiwi specimen’s zz score. z=103.2958=1.03z = \frac{103.2-95}{8} = 1.03
    2. Use the zz table to find the cumulative probability P(Z<1.03)=0.8485P(Z<1.03) = 0.8485
    3. We use the formula for calculating the plum specimen’s zz score. z=100.81058=0.53z = \frac{100.8-105}{8} = -0.53
    4. Use the zz table to find the cumulative probability P(Z<0.53)=0.2981P(Z<-0.53) = 0.2981
    5. We use the formula for calculating the apricot specimen’s zz score. z=42434=0.25z = \frac{42-43}{4} = -0.25
    6. Use the zz table to find the cumulative probability P(Z<0.25)=0.4013P(Z<-0.25) = 0.4013
    7. We use the formula for calculating the apple specimen’s zz score. z=203.921412=0.84z = \frac{203.9-214}{12} = -0.84
    8. Use the zz table to find the cumulative probability P(Z<0.84)=0.2005P(Z<-0.84) = 0.2005
    9. We determine the maximum zz score is 1.031.03, which belongs to the kiwi. / . / . / .
    10. We determine the minimum zz score is 0.84-0.84, which belongs to the apple. / . / . / .
    11. We determine the minimum absolute zz score is 0.250.25, which belongs to the apricot. / . / . / .
    12. We determine the maximum absolute zz score is 1.031.03, which belongs to the kiwi. / . / . / .

  73. Question

    Random variable DD is normally distributed with mean μ=40\mu = 40 and standard deviation σ=10\sigma = 10. D=N(40,10)D = \text{N}(40,\,10) Evaluate P(D<37)P(D < 37).


    Solution

    First, draw a sketch. We can label the DD axis by adding integer multiples of 10 to 40. We know to shade toward the left because small values of DD satisfy the condition D<37D<37.

    plot of chunk unnamed-chunk-1

    We are given a specific dd value as a boundary. (Remember, for random variables we use uppercase letters, but for specific values we use lowercase.) d=37d = 37 We calculate the zz value of the boundary. z=dμσ=374010=0.3\begin{aligned} z &= \frac{d-\mu}{\sigma} \\\\ &= \frac{37-40}{10} \\\\ &= -0.3 \end{aligned}

    We have rephrased our problem into a standard normal probability problem, because P(D<37)=P(Z<0.3)P(D<37) ~=~ P(Z<-0.3)

    So, we just need to evaluate P(Z<0.3)P(Z<-0.3). To do this, you just need a zz-table.

    zz P(Z<z)P(Z<z)
    -0.32 0.3745
    -0.31 0.3783
    -0.3 0.3821
    -0.29 0.3859
    -0.28 0.3897

    Thus, we find our answer. P(D<37)=0.3821P(D<37) = 0.3821


  74. Question

    Random variable HH is normally distributed with mean μ=86\mu = 86 and standard deviation σ=21\sigma = 21. H=N(86,21)H = \text{N}(86,\,21) Evaluate P(H>107)P(H > 107).


    Solution

    First, draw a sketch. We can label the HH axis by adding integer multiples of 21 to 86. We know to shade toward the right because large values of HH satisfy the condition H>107H>107.

    plot of chunk unnamed-chunk-1

    We are given a specific hh value as a boundary. (Remember, for random variables we use uppercase letters, but for specific values we use lowercase.) h=107h = 107 We calculate the zz value of the boundary. z=hμσ=1078621=1\begin{aligned} z &= \frac{h-\mu}{\sigma} \\\\ &= \frac{107-86}{21} \\\\ &= 1 \end{aligned}

    We have rephrased our problem into a standard normal probability problem, because P(H>107)=P(Z>1)P(H>107) ~=~ P(Z>1)

    So, we just need to evaluate P(Z>1)P(Z>1).

    To do this, you need to remember that right-area events are complementary to left-area events. P(Z>1)=1P(Z<1)P(Z>1) ~=~ 1-P(Z<1)

    You can use a zz-table.

    zz P(Z<z)P(Z<z)
    0.98 0.8365
    0.99 0.8389
    1 0.8413
    1.01 0.8438
    1.02 0.8461

    Thus, we find our answer. P(H>107)=10.8413=0.1587\begin{aligned} P(H>107) &= 1-0.8413 \\ &= 0.1587 \end{aligned}


  75. Question

    Random variable WW is normally distributed with mean μ=75\mu = 75 and standard deviation σ=4\sigma = 4. W=N(75,4)W = \text{N}(75,\,4) Evaluate P(|W75|<5)P\left(\big|W-75\big| < 5\right). In other words, what is the probability that WW is within ±5\pm5 units from the mean?


    Solution

    First, draw a sketch. We can label the WW axis by adding integer multiples of 4 to 75. We know to shade the center because values near 75 satisfy the condition |W75|<5\big|W-75\big| < 5. We draw the boundaries at 755=7075-5=70 and 75+5=8075+5=80 because those are the solutions to |W75|=5\big|W-75\big| = 5. We can also rephrase the probability.

    P(|W75|<5)=P(70<W<80)P\left(\big|W-75\big| < 5\right) ~=~ P(70<W<80)

    plot of chunk unnamed-chunk-1

    We calculate the zz values of the boundaries. Left boundary: z1=70754=54=1.25\begin{aligned} z_1 &= \frac{70-75}{4} \\\\ &= \frac{-5}{4} \\\\ &= -1.25 \end{aligned}

    Right boundary:

    z2=80754=54=1.25\begin{aligned} z_2 &= \frac{80-75}{4} \\\\ &= \frac{5}{4} \\\\ &= 1.25 \end{aligned}

    We have rephrased our problem into a standard normal probability problem, because

    P(|W75|<5)=P(70<W<80)=P(1.25<Z<1.25)=P(|Z|<1.25)P\left(\big|W-75\big| < 5\right) ~=~ P(70<W<80) ~=~ P(-1.25<Z<1.25) ~=~ P\left(\big|Z\big|<1.25\right)

    So, we just need to evaluate P(|Z|<1.25)P(|Z|<1.25). I will also point out that 54=1.25\frac{5}{4} = 1.25. In general, if XX is normally distributed, then: P(|Xμ|<d)=P(|Z|<dσ)P\left(\big|X-\mu\big|<d\right) ~=~ P\left(\big|Z\big| < \frac{d}{\sigma}\right)

    From here we have a formula that lets us use the zz table. (We practiced this part before.) P(|Z|<1.25)=2×P(Z<1.25)1=2×0.89441=0.7888\begin{aligned} P(|Z|<1.25) &= 2 \times P(Z<1.25)-1 \\ &= 2 \times 0.8944-1 \\ &= 0.7888 \end{aligned}


  76. Question

    Random variable QQ is normally distributed with mean μ=0.16\mu = 0.16 and standard deviation σ=0.05\sigma = 0.05. Q=N(0.16,0.05)Q = \text{N}(0.16,\,0.05) Evaluate P(|Q0.16|>0.11)P\left(\big|Q-0.16\big| > 0.11\right). In other words, what is the probability that QQ is outside ±0.11\pm0.11 units from the mean?


    Solution

    First, draw a sketch. We can label the QQ axis by adding integer multiples of 0.05 to 0.16. We know to shade the two tails because values far from 0.16 satisfy the condition |Q0.16|>0.11\big|Q-0.16\big| > 0.11. We draw the boundaries at 0.160.11=0.050.16-0.11=0.05 and 0.16+0.11=0.270.16+0.11=0.27 because those are the solutions to |Q0.16|=0.11\big|Q-0.16\big| = 0.11. We can also rephrase the probability.

    P(|Q0.16|>0.11)=P(Q<0.05 OR Q>0.27)P\left(\big|Q-0.16\big| > 0.11\right) ~=~ P\big(Q<0.05 \text{ OR } Q>0.27\big)

    plot of chunk unnamed-chunk-1

    We calculate the zz values of the boundaries. Left boundary: z1=0.050.160.05=0.110.05=2.2\begin{aligned} z_1 &= \frac{0.05-0.16}{0.05} \\\\ &= \frac{-0.11}{0.05} \\\\ &= -2.2 \end{aligned}

    Right boundary:

    z2=0.270.160.05=0.110.05=2.2\begin{aligned} z_2 &= \frac{0.27-0.16}{0.05} \\\\ &= \frac{0.11}{0.05} \\\\ &= 2.2 \end{aligned}

    We have rephrased our problem into a standard normal probability problem, because

    P(|Q0.16|>0.11)=P(|Z|>2.2)P\left(\big|Q-0.16\big| > 0.11\right) ~=~ P\left(\big|Z\big|>2.2\right)

    So, we just need to evaluate P(|Z|>2.2)P(|Z|>2.2). I will also point out that 0.110.05=2.2\frac{0.11}{0.05} = 2.2. In general, if XX is normally distributed, then: P(|Xμ|>d)=P(|Z|>dσ)P\left(\big|X-\mu\big|>d\right) ~=~ P\left(\big|Z\big| > \frac{d}{\sigma}\right)

    From here we have a formula that lets us use the zz table. (We practiced this part before.) P(|Z|>2.2)=22×P(Z<2.2)=22×0.9861=0.0278\begin{aligned} P(|Z|>2.2) &= 2- 2 \times P(Z<2.2) \\ &= 2 - 2 \times 0.9861 \\ &= 0.0278 \end{aligned}


  77. Question

    Random variable HH is normally distributed with mean μ=180\mu = 180 and standard deviation σ=40\sigma = 40. H=N(180,40)H = \text{N}(180,\,40) Evaluate P(210<X<220)P\left(210 < X < 220\right). In other words, what is the probability that HH is between 210210 and 220220?


    Solution

    First, draw a sketch. We can label the HH axis by adding integer multiples of 40 to 180. We know to shade the between 210210 and 220220 because those values satisfy the condition 210<X<220210 < X < 220.

    plot of chunk unnamed-chunk-1

    We calculate the zz values of the boundaries. Left boundary: z1=x1μσ=21018040=0.75\begin{aligned} z_1 &= \frac{x_1-\mu}{\sigma} \\\\ &= \frac{210-180}{40} \\\\ &= 0.75 \end{aligned}

    Right boundary:

    z2=x2μσ=22018040=1\begin{aligned} z_2 &= \frac{x_2-\mu}{\sigma} \\\\ &= \frac{220-180}{40} \\\\ &= 1 \end{aligned}

    We rephrase our problem into a standard normal probability problem:

    P(210<X<220)=P(0.75<Z<1)P\left(210 < X < 220\right) ~=~ P(0.75<Z<1)

    So, we just need to evaluate P(0.75<Z<1)P(0.75<Z<1).

    From here we have a formula that lets us use the zz table. (We practiced this part before.) P(z1<Z<z2)=P(Z<z2)P(Z<z1)P(0.75<Z<1)=P(Z<1)P(Z<0.75)=0.84130.7734=0.0679\begin{aligned} P(z_1 < Z < z_2) &= P(Z<z_2) - P(Z<z1) \\ P(0.75<Z<1) &= P(Z<1)-P(Z<0.75)\\ &= 0.8413-0.7734 \\ &= 0.0679 \end{aligned}


  78. Question

    Random variable BB is normally distributed with mean μ=9.6\mu = 9.6 and standard deviation σ=2.6\sigma = 2.6. B=N(9.6,2.6)B = \text{N}(9.6,\,2.6) Evaluate bb such that P(B<b)=0.39P(B<b) = 0.39. In other words, determine an upper boundary such that a normal spinner with mean 9.6 and standard deviation 2.6 lands under that boundary 39% of the time.


    Solution

    First, draw a sketch. We can label the BB axis by adding integer multiples of 2.6 to 9.6. We know to shade the left because low values of BB satisfy the condition B<bB<b (regardless of the exact value of bb). We don’t know exactly where to place the boundary, but we know the left area is 0.39.

    It is helpful to know the following approximations:

    zz P(Z<z)P(Z<z)
    -3 0.001
    -2 0.023
    -1 0.159
    0 0.5
    1 0.841
    2 0.977
    3 0.999

    So, we know the zz-score is between -1 and 0. Remember, ZZ and zz always refer to the standard normal variable.

    plot of chunk unnamed-chunk-1

    By using the zz table we can determine zz more precisely.

    zz P(Z<z)P(Z<z)
    -0.3 0.3821
    -0.29 0.3859
    -0.28 0.3897
    -0.27 0.3936
    -0.26 0.3974
    -0.29 0.4013

    Either -0.28 or -0.27 is a good estimation of zz, and either value will lead you to an acceptable answer. Using other tools, a more accurate value can be found. I will show the work with a more accurate value. z=0.2793z=-0.2793

    We now convert the zz score into a bb score. b=μ+zσ=9.6+(0.2793)(2.6)=8.874\begin{aligned} b &= \mu+z\sigma \\ &= 9.6+(-0.2793)(2.6) \\ &= 8.874 \end{aligned}

    We can also visualize this with a spinner.

    plot of chunk unnamed-chunk-2

    The tolerance for an acceptable answer was ±0.1\pm 0.1 from 8.8737705. So, anything between 8.7737705 and 8.9737705 was accepted.


  79. Question

    Random variable KK is normally distributed with mean μ=35\mu = 35 and standard deviation σ=7\sigma = 7. K=N(35,7)K = \text{N}(35,\,7) Evaluate kk such that P(K>k)=0.94P(K>k) = 0.94. In other words, determine an lower boundary such that a normal spinner with mean 35 and standard deviation 7 lands on a value more than that boundary 94% of the time.


    Solution

    First, draw a sketch. We can label the KK axis by adding integer multiples of 7 to 35. We know to shade the right because high values of KK satisfy the condition K>kK>k (regardless of the exact value of kk). We don’t know exactly where to place the boundary, but we know the right area is 0.94.

    plot of chunk unnamed-chunk-1

    We know how to find the left area. P(K<k)=1P(K>k)=10.94=0.06\begin{aligned} P(K<k) &= 1-P(K>k) \\ &= 1-0.94 \\ &= 0.06 \end{aligned}

    plot of chunk unnamed-chunk-2

    As an intermediate step, we find zz such that P(Z<z)=0.06P(Z<z)=0.06.

    By using the zz table we can determine zz.

    zz P(Z<z)P(Z<z)
    -1.58 0.0571
    -1.57 0.0582
    -1.56 0.0594
    -1.55 0.0606
    -1.54 0.0618
    -1.57 0.063

    Either -1.56 or -1.55 is a good estimation of zz, and either value will lead you to an acceptable answer. Using other tools, a more accurate value can be found. I will show the work with a more accurate value. z=1.5548z=-1.5548

    We now convert the zz score into a kk score. k=μ+zσ=35+(1.5548)(7)=24.12\begin{aligned} k &= \mu+z\sigma \\ &= 35+(-1.5548)(7) \\ &= 24.12 \end{aligned}

    We can also visualize this with a spinner.

    plot of chunk unnamed-chunk-3

    The tolerance for an acceptable answer was ±1\pm 1 from 24.1165848. So, anything between 23.1165848 and 25.1165848 was accepted.


  80. Question

    Random variable YY is normally distributed with mean μ=8.3\mu = 8.3 and standard deviation σ=1.7\sigma = 1.7. Y=N(8.3,1.7)Y = \text{N}(8.3,\,1.7)


    1. Evaluate dd such that P(|Y8.3|<d)=0.2P\left(\big|Y-8.3\big|<d\right) = 0.2. In other words, determine the distance from mean of two boundaries such that this normal spinner lands within those boundaries 20% of the time.
    2. Let y1=8.3dy_1 = 8.3-d and y2=8.3+dy_2 = 8.3+d, such that P(y1<Y<y2)=0.2P\left(y_1 < Y < y_2\right) = 0.2. Evaluate y1y_1.
    3. Using the definitions above, evaluate y2y_2

    Solution

    First, draw a sketch. We can label the YY axis by adding integer multiples of 1.7 to 8.3. We know to shade the the middle because values near 8.3 satisfy the condition |Y8.3|<d\big|Y-8.3\big|<d (regardless of the exact value of dd). We don’t know exactly where to place the boundaries, but we know the central area is 0.2.

    plot of chunk unnamed-chunk-1

    As an intermediate step, let’s find zz such that P(|Z|<z)=0.2P(|Z|<z)=0.2. First, we need to evaluate P(Z<z)P(Z<z).

    P(Z<z)=1+P(|Z|<z)2=1+0.22=0.6\begin{aligned} P(Z<z) &= \frac{1+P(|Z|<z)}{2} \\ \\ &= \frac{1+0.2}{2}\\ \\ &= 0.6 \end{aligned}

    You could have also drawn some pictures… we know there is symmetry and all the areas should add to 1.

    plot of chunk unnamed-chunk-2

    plot of chunk unnamed-chunk-3

    We find zz such that P(Z<z)=0.6P(Z<z)=0.6.

    By using the zz table we can determine zz.

    zz P(Z<z)P(Z<z)
    0.23 0.591
    0.24 0.5948
    0.25 0.5987
    0.26 0.6026
    0.27 0.6064
    0.24 0.6103

    Either 0.25 or 0.26 is a good estimation of zz, and either value will lead you to an acceptable answer. Using other tools, a more accurate value can be found. I will show the work with a more accurate value. z=0.2533z=0.2533

    We now convert the zz score into a dd score. d=zσ=(0.2533)(1.7)=0.4307\begin{aligned} d &= z\sigma \\ &= (0.2533)(1.7) \\ &= 0.4307 \end{aligned}

    We can also visualize this with a spinner.

    plot of chunk unnamed-chunk-4


    1. We determined d=0.4307d=0.4307
    2. It is easy to find y1=8.30.4307=7.8693y_1 = 8.3-0.4307 = 7.8693
    3. It is easy to find y2=8.3+0.4307=8.7307y_2 = 8.3+0.4307 = 8.7307

  81. Question

    Random variable VV is normally distributed with mean μ=600\mu = 600 and standard deviation σ=130\sigma = 130. V=N(600,130)V = \text{N}(600,\,130)


    1. Evaluate dd such that P(|V600|>d)=0.7P\left(\big|V-600\big|>d\right) = 0.7. In other words, determine the distance from mean of two boundaries such that this normal spinner lands outside those boundaries 70% of the time.
    2. Let v1=600dv_1 = 600-d, such that P(V<v1)=0.72P\left(V < v_1\right) = \frac{0.7}{2}. Evaluate v1v_1.
    3. Let v2=600+dv_2 = 600+d, such that P(V>v2)=0.72P\left(V > v_2\right) = \frac{0.7}{2}. Evaluate v2v_2.

    Solution

    First, draw a sketch. We can label the VV axis by adding integer multiples of 130 to 600. We know to shade the the outsides because values far from 600 satisfy the condition |V600|>d\big|V-600\big|>d (regardless of the exact value of dd). We don’t know exactly where to place the boundaries, but we know the two-tail area is 0.7.

    plot of chunk unnamed-chunk-1

    As an intermediate step, let’s find zz such that P(|Z|>z)=0.7P(|Z|>z)=0.7. First, we need to evaluate P(Z<z)P(Z<z).

    P(Z<z)=2P(|Z|>z)2=20.72=0.65\begin{aligned} P(Z<z) &= \frac{2-P(|Z|>z)}{2} \\ \\ &= \frac{2-0.7}{2} \\ \\ &= 0.65 \end{aligned}

    You could have also drawn some pictures… we know there is symmetry and all the areas should add to 1.

    plot of chunk unnamed-chunk-2

    plot of chunk unnamed-chunk-3

    We find zz such that P(Z<z)=0.65P(Z<z)=0.65.

    By using the zz table we can determine zz.

    zz P(Z<z)P(Z<z)
    0.36 0.6406
    0.37 0.6443
    0.38 0.648
    0.39 0.6517
    0.4 0.6554
    0.37 0.6591

    Either 0.38 or 0.39 is a good estimation of zz, and either value will lead you to an acceptable answer. Using other tools, a more accurate value can be found. I will show the work with a more accurate value. z=0.3853z=0.3853

    We now convert the zz score into a dd score. d=zσ=(0.3853)(130)=50.09\begin{aligned} d &= z\sigma \\ &= (0.3853)(130) \\ &= 50.09 \end{aligned}

    We can also visualize this with a spinner.

    plot of chunk unnamed-chunk-4


    1. We determined d=50.09d=50.09
    2. It is easy to find v1=60050.09=549.91v_1 = 600-50.09 = 549.91
    3. It is easy to find v2=600+50.09=650.09v_2 = 600+50.09 = 650.09

  82. Question

    This question provides μ\mu, σ\sigma, nn, and x\bar{x}. You will characterize the sampling distribution, calculate the standard score, determine the percentile rank (expressed as a decimal), and translate the score into a rating (see below).

    Population

    When Archie practices archery, each arrow has the same probability distribution (see i.i.d). This means her skill is constant, there is no hot-hand effect, and there is no maturity of chances.

    Over many months, Archie has shot ten thousand arrows (their positions are shown below as dots).

    plot of chunk unnamed-chunk-1

    From those many arrows, Archie has determined an accurate probability distribution (of the points scored by an arrow).

    plot of chunk unnamed-chunk-2

    Thus, Archie can determine her population mean and population standard deviation. μ=9.04\mu=9.04 σ=1.03\sigma=1.03

    Sampling distribution

    Each day, Archie shoots 48 arrows (n=48n=48) and determines that day’s mean score. A mean of 10 would be a perfect day. We can treat each day’s mean as a random variable with its own probability distribution: a sampling distribution. We wish to characterize this sampling distribution. From the central limit theorem, we know the sampling distribution is approximately normal; however, we need to calculate the parameters.

    Expected mean (mean of sampling distribution)

    The expected mean is simply the population mean. expected mean=μ\text{expected mean} = \mu

    “Expected mean” is a misnomer. It is not necessarily likely, or even possible, for a mean to equal the expected mean. However, if Archie repeatedly shot 48 arrows, we expect the means to have a average equal to the expected mean. So, maybe “average of means” would be better terminology.

    This expected mean is the average of the sampling distribution.

    Determine the expected mean:

    (Round to the hundredths place)

    Standard error (standard deviation of sampling distribution)

    The standard error of a mean is the quotient of the population standard deviation and the square root of the sample size. standard error of mean=σn\text{standard error of mean} = \frac{\sigma}{\sqrt{n}} This standard error is the standard deviation of the sampling distribution.

    Calculate the standard error:

    (Round to the thousandths place)

    Sample

    Today, Archie’s mean score is 9.062 points (x=9.062\bar{x} = 9.062).

    plot of chunk unnamed-chunk-3

    Standard score

    Archie would like to know how well she did today. She wants you to calculate a standard score (a zz score). To calculate the standard score of a sample mean, you can use the following formula. z=xμσ/nz = \frac{\bar x - \mu}{\sigma/\sqrt{n}}

    Calculate the standard score:

    (Round to the hundredths place)

    Cumulative probability (percentile rank)

    Archie would like to know the probability that tomorrow she shoots worse than today. To estimate this, report the cumulative probability associated with the zz-score you calculated. This can be done with a zz table, pnorm function in R, NORM.DIST function in a spreadsheet, or with other standard normal tools.

    Calculate P(Zz)P(Z\leq z), the cumulative probability:

    (Round to the ten-thousandths place)

    Rating

    Archie would like a rating for her day’s performance. You decide to use the following scale and that if zz is on a boundary, Archie will be given the higher rating.

    zz interval rating
    -\infty to -1.51 F
    -1.5 to -0.51 D
    -0.5 to 0.49 C
    0.5 to 1.49 B
    1.5 to \infty A

    plot of chunk unnamed-chunk-4

    Determine Archie’s rating: A / B / C / D / F



    Solution

    The sampling distribution is approximately normal with mean 9.04 and standard error 0.15. The sampling distribution can be visualized, along with today’s mean (9.062) highlighted in red. The cumulative probability is the sum of the probabilities of scores less than (or equal) 9.062. Thus, the area highlighted in blue represents the cumulative probability.

    plot of chunk unnamed-chunk-5


    1. The expected mean is 9.049.04
    2. The standard error is σ/n=0.149\sigma/\sqrt{n} = 0.149
    3. Use the formula to calculate z=0.15z=0.15
    4. Use the zz table to find P(Zz)=0.5596P(Z\le z) = 0.5596
    5. The proper rating is C / The proper rating is C / The proper rating is C / The proper rating is C / The proper rating is C

  83. Question

    This question provides μ\mu, σ\sigma, nn, and x\sum x. You will characterize the sampling distribution, calculate the standard score, determine the percentile rank (expressed as a decimal), and translate the score into a rating (see below).

    Population

    When Archie practices archery, each arrow has the same probability distribution (see i.i.d). This means her skill is constant, there is no hot-hand effect, and there is no maturity of chances.

    Over many months, Archie has shot ten thousand arrows (their positions are shown below as dots).

    plot of chunk unnamed-chunk-1

    From those many arrows, Archie has determined an accurate probability distribution (of the points scored by an arrow).

    plot of chunk unnamed-chunk-2

    Thus, Archie can determine her population mean and population standard deviation. μ=7.93\mu=7.93 σ=1.66\sigma=1.66

    Sampling distribution

    Each day, Archie shoots 108 arrows (n=108n=108) and determines that day’s total score. A total of 1080 would be a perfect day. We can treat each day’s total as a random variable with its own probability distribution: a sampling distribution. We wish to characterize this sampling distribution. From the central limit theorem, we know the sampling distribution is approximately normal; however, we need to calculate the parameters.

    Expected total (mean of sampling distribution)

    The expected total is the product of the sample size and the population mean. expected total=nμ\text{expected total} = n\mu

    “Expected total” is a misnomer. It is not necessarily likely, or even possible, for a total to equal the expected total. However, if Archie repeatedly shot 108 arrows, we expect the totals to have a mean equal to the expected total. So, maybe “average of totals” would be better terminology.

    This expected total is the average of the sampling distribution.

    Calculate the expected total:

    (Round to the hundredths place)

    Standard error (standard deviation of sampling distribution)

    The standard error of a total is the product of the population standard deviation and the square root of the sample size. standard error of total=σn\text{standard error of total} = \sigma \sqrt{n} This standard error is the standard deviation of the sampling distribution.

    Calculate the standard error:

    (Round to the hundredths place)

    Sample

    Today, Archie’s total score is 896 points (x=896\sum x = 896).

    plot of chunk unnamed-chunk-3

    Standard score

    Archie would like to know how well she did today. She wants you to calculate a standard score (a zz score). To calculate the standard score of a sample total, you can use the following formula. z=xnμσnz = \frac{\sum x - n\mu}{\sigma\sqrt{n}}

    Calculate the standard score:

    (Round to the hundredths place)

    Cumulative probability (percentile rank)

    Archie would like to know the probability that tomorrow she shoots worse than today. To estimate this, report the cumulative probability associated with the standard score you calculated. This can be done with a zz table, pnorm function in R, NORM.DIST function in a spreadsheet, or with other standard normal tools.

    Calculate P(Zz)P(Z\leq z), the cumulative probability:

    (Round to the ten-thousandths place)

    Rating

    Archie would like a rating for her day’s performance. You decide to use the following scale and that if zz is on a boundary, Archie will be given the higher rating.

    zz interval rating
    -\infty to -1.5 F
    -1.5 to -0.5 D
    -0.5 to 0.5 C
    0.5 to 1.5 B
    1.5 to \infty A

    plot of chunk unnamed-chunk-4

    Determine Archie’s rating: A / B / C / D / F



    Solution

    The sampling distribution is approximately normal with mean 856.44 and standard error 17.25. The sampling distribution can be visualized, along with today’s total (896) highlighted in red. The cumulative probability is the sum of the probabilities of scores less than (or equal) 896. Thus, the area highlighted in blue represents the cumulative probability.

    plot of chunk unnamed-chunk-5


    1. The expected total is nμ=856.44n\mu = 856.44
    2. The standard error is σn=17.25\sigma \sqrt{n} = 17.25
    3. Use the formula to calculate z=2.29z=2.29
    4. Use the zz table to find P(Zz)=0.989P(Z\le z) = 0.989
    5. The proper rating is A / The proper rating is A / The proper rating is A / The proper rating is A / The proper rating is A

  84. Question

    A farm produces 4 types of fruit: coconuts, lemons, mangos, and oranges. The fruits’ masses have population parameters dependent on the type of fruit. (All values are in grams.)

    _ Type of fruit _ _ Population mean (μ\mu) _ _ Population standard deviation (σ\sigma) _
    coconuts 673 59
    lemons 129 15
    mangos 179 18
    oranges 223 15

    A sample of each type is weighed. The results are shown below.

    _ Type of fruit _ _ Sample size (nn) _ _ Sample mean (x\bar{x}) _
    coconut 144 675.1
    lemon 49 132
    mango 121 177.4
    orange 100 224

    The population parameters and sample statistics can be downloaded as a csv.

    For each sample, determine the mean’s standard score and the mean’s cumulative probability by assuming the requirements for the central limit theorem are met. Then determine which sample mean is most unusually large, most unusually small, most typically sized, and most unusually sized.

    The standard score of a sample mean: z=xμσ/nz = \frac{\bar{x}-\mu}{\sigma/\sqrt{n}} Some people prefer to call this denominator the standard error. standard error of mean=SEM=σn\text{standard error of mean} = \text{SEM} = \frac{\sigma}{\sqrt{n}} So the standard score (zz-score) of a sample mean can also be expressed as z=xμSEMz = \frac{\bar{x}-\mu}{\text{SEM}}


    1. Calculate a zz-score for the coconut sample mean. (Round to the nearest hundredth.)
    2. What proportion of coconut samples of size n=144n=144 have a smaller mean than the coconut sample?
    3. Calculate a zz-score for the lemon sample mean. (Round to the nearest hundredth.)
    4. What proportion of lemon samples of size n=49n=49 have a smaller mean than the lemon sample?
    5. Calculate a zz-score for the mango sample mean. (Round to the nearest hundredth.)
    6. What proportion of mango samples of size n=121n=121 have a smaller mean than the mango sample?
    7. Calculate a zz-score for the orange sample mean. (Round to the nearest hundredth.)
    8. What proportion of orange samples of size n=100n=100 have a smaller mean than the orange sample?
    9. The coconut sample mean is most unusually large. / The lemon sample mean is most unusually large. / The mango sample mean is most unusually large. / The orange sample mean is most unusually large.
    10. The coconut sample mean is most unusually small. / The lemon sample mean is most unusually small. / The mango sample mean is most unusually small. / The orange sample mean is most unusually small.
    11. The coconut sample mean is most typically sized. / The lemon sample mean is most typically sized. / The mango sample mean is most typically sized. / The orange sample mean is most typically sized.
    12. The coconut sample mean is most unusually sized. / The lemon sample mean is most unusually sized. / The mango sample mean is most unusually sized. / The orange sample mean is most unusually sized.

    Solution

    The formula to determine the zz-score of a sample mean is the ratio with numerator the difference between measurement and population mean and denominator the standard error of the mean.

    z=xμσ/nz = \frac{\bar{x}-\mu}{\sigma/\sqrt{n}}

    The highest zz-score (furthest right on number line) corresponds to the most unusually large sample mean.

    The smallest zz-score (furthest left on number line) corresponds to the most unusually small sample mean.

    The smallest absolute zz-score corresponds to the most usually sized sample mean.

    The largest absolute zz-score corresponds to the most unusually sized sample mean.

    Spreadsheet:

    By using the formulas, you should get a spreadsheet like the one displayed here:

    plot of chunk unnamed-chunk-1

    To do this in R:

    data = read.csv("fruit.csv", as.is=TRUE)
    fruit = data$fruit
    mu = data$mu
    sigma = data$sigma
    n = data$n
    xbar = data$xbar
    
    SEM = sigma/sqrt(n)
    z = round( (xbar-mu)/SEM ,2)
    cum_prob = round( pnorm(z) ,4)
    data.frame(data,SEM,z,cum_prob,abs(z))
    ##     fruit  mu sigma   n  xbar      SEM     z cum_prob abs.z.
    ## 1 coconut 673    59 144 675.1 4.916667  0.43   0.6664   0.43
    ## 2   lemon 129    15  49 132.0 2.142857  1.40   0.9192   1.40
    ## 3   mango 179    18 121 177.4 1.636364 -0.98   0.1635   0.98
    ## 4  orange 223    15 100 224.0 1.500000  0.67   0.7486   0.67
    # Most unusually large
    fruit[z==max(z)]
    ## [1] "lemon"
    # Most unusually small
    fruit[z==min(z)]
    ## [1] "mango"
    # Most typically sized
    fruit[abs(z)==min(abs(z))]
    ## [1] "coconut"
    # Most unusually sized
    fruit[abs(z)==max(abs(z))]
    ## [1] "lemon"

    1. We use the formula for calculating the coconut sample-mean zz score. z=675.167359/144=0.43z = \frac{675.1-673}{59/\sqrt{144}} = 0.43
    2. Use the zz table to find the cumulative probability. P(Z<0.43)=0.6664P(Z<0.43) = 0.6664
    3. We use the formula for calculating the lemon sample-mean zz score. z=13212915/49=1.4z = \frac{132-129}{15/\sqrt{49}} = 1.4
    4. Use the zz table to find the cumulative probability. P(Z<1.4)=0.9192P(Z<1.4) = 0.9192
    5. We use the formula for calculating the mango sample-mean zz score. z=177.417918/121=0.98z = \frac{177.4-179}{18/\sqrt{121}} = -0.98
    6. Use the zz table to find the cumulative probability. P(Z<0.98)=0.1635P(Z<-0.98) = 0.1635
    7. We use the formula for calculating the orange sample-mean zz score. z=22422315/100=0.67z = \frac{224-223}{15/\sqrt{100}} = 0.67
    8. Use the zz table to find the cumulative probability. P(Z<0.67)=0.7486P(Z<0.67) = 0.7486
    9. We determine the maximum zz score is 1.41.4, which belongs to the lemon. / . / . / .
    10. We determine the minimum zz score is 0.98-0.98, which belongs to the mango. / . / . / .
    11. We determine the minimum absolute zz score is 0.430.43, which belongs to the coconut. / . / . / .
    12. We determine the maximum absolute zz score is 1.41.4, which belongs to the lemon. / . / . / .

  85. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=48.51\mu_{X}=48.51 and standard deviation of σX=8.64\sigma_{X}=8.64.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 729 times, and the sample mean of the spins will be recorded. Determine the probability that the random sample mean is less than 48.03. P(X<48.03)=?P\left(\bar{X}<48.03\right) ~=~ \,?

    Please approximate the random mean as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random mean. μX=μX=48.51\mu_{\bar{X}} = \mu_{X} = 48.51 σX=σXn=8.64729=0.32\sigma_{\bar{X}} = \frac{\sigma_{X}}{\sqrt{n}} = \frac{8.64}{\sqrt{729}} = 0.32 So, we think the random mean is normally distributed with a mean of 48.51 and a standard deviation of 0.32. XN(48.51,0.32)\bar{X} \sim N(48.51,\,0.32)

    Calculate a zz-score for the boundary. z=48.0348.510.32=1.5z = \frac{48.03-48.51}{0.32} = -1.5

    You can round that to z=1.5z=-1.5 or z=1.49z=-1.49. Any of the following probabilities will get credit. P(Z<1.5)=0.0668072P(Z < -1.5) = 0.0668072 P(Z<1.5)=0.0668P(Z < -1.5) = 0.0668 P(Z<1.49)=0.0681P(Z < -1.49) = 0.0681

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  86. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=90.1\mu_{X}=90.1 and standard deviation of σX=2.98\sigma_{X}=2.98.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 93 times, and the sample mean of the spins will be recorded. Determine the probability that the random sample mean is more than 90.49. P(X>90.49)=?P\left(\bar{X}>90.49\right) ~=~ \,?

    Please approximate the random mean as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random mean. μX=μX=90.1\mu_{\bar{X}} = \mu_{X} = 90.1 σX=σXn=2.9893=0.3090116\sigma_{\bar{X}} = \frac{\sigma_{X}}{\sqrt{n}} = \frac{2.98}{\sqrt{93}} = 0.3090116 So, we think the random mean is normally distributed with a mean of 90.1 and a standard deviation of 0.3090116. XN(90.1,0.3090116)\bar{X} \sim N(90.1,\,0.3090116)

    Calculate a zz-score for the boundary. z=90.4990.10.3090116=1.2620885z = \frac{90.49-90.1}{0.3090116} = 1.2620885

    You can round that to z=1.26z=1.26 or z=1.27z=1.27. Any of the following probabilities will get credit. P(Z>1.2620885)=0.1034585P(Z > 1.2620885) = 0.1034585 P(Z>1.26)=0.1038P(Z > 1.26) = 0.1038 P(Z>1.27)=0.102P(Z > 1.27) = 0.102

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  87. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=34.84\mu_{X}=34.84 and standard deviation of σX=4.04\sigma_{X}=4.04.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 138 times, and the sample mean of the spins will be recorded. Determine the probability that the random sample mean is within 0.340.34 units from μX\mu_{\bar{X}}. P(|XμX|<0.34)=?P\left(\left|\bar{X}-\mu_{\bar{X}}\right|<0.34\right) ~=~ \,?

    Please approximate the random mean as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random mean. μX=μX=34.84\mu_{\bar{X}} = \mu_{X} = 34.84 σX=σXn=4.04138=0.3439076\sigma_{\bar{X}} = \frac{\sigma_{X}}{\sqrt{n}} = \frac{4.04}{\sqrt{138}} = 0.3439076 So, we think the random mean is normally distributed with a mean of 34.84 and a standard deviation of 0.3439076. XN(34.84,0.3439076)\bar{X} \sim N(34.84,\,0.3439076)

    Calculate a zz-score for the boundary. z=0.340.3439076=0.9886375z = \frac{0.34}{0.3439076} = 0.9886375

    You can round that to z=0.98z=0.98 or z=0.99z=0.99. Any of the following probabilities will get credit. P(|Z|<0.9886375)=0.6771595P(|Z| < 0.9886375) = 0.6771595 P(|Z|<0.98)=0.673P(|Z| < 0.98) = 0.673 P(|Z|<0.99)=0.6778P(|Z| < 0.99) = 0.6778

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  88. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=49.57\mu_{X}=49.57 and standard deviation of σX=1.87\sigma_{X}=1.87.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 79 times, and the sample mean of the spins will be recorded. Determine the probability that the random sample mean is farther than 0.240.24 units from μX\mu_{\bar{X}}. P(|XμX|>0.24)=?P\left(\left|\bar{X}-\mu_{\bar{X}}\right|>0.24\right) ~=~ \,?

    Please approximate the random mean as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random mean. μX=μX=49.57\mu_{\bar{X}} = \mu_{X} = 49.57 σX=σXn=1.8779=0.2103914\sigma_{\bar{X}} = \frac{\sigma_{X}}{\sqrt{n}} = \frac{1.87}{\sqrt{79}} = 0.2103914 So, we think the random mean is normally distributed with a mean of 49.57 and a standard deviation of 0.2103914. XN(49.57,0.2103914)\bar{X} \sim N(49.57,\,0.2103914)

    Calculate a zz-score for the boundary. z=0.240.2103914=1.1407308z = \frac{0.24}{0.2103914} = 1.1407308

    You can round that to z=1.14z=1.14 or z=1.15z=1.15. Any of the following probabilities will get credit. P(|Z|>1.1407308)=0.253982P(|Z| > 1.1407308) = 0.253982 P(|Z|>1.14)=0.2542P(|Z| > 1.14) = 0.2542 P(|Z|>1.15)=0.2502P(|Z| > 1.15) = 0.2502

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  89. Question

    A fair 8-sided die (with sides numbered 1 through 8) will be rolled 64 times, and the sum (total) of those rolls will be recorded. What is the probability that the sum is less than 305.5? P(X<305.5)=?P\left(\sum X < 305.5\right) ~=~ ?

    To help you along, I will calculate the mean and standard deviation of single rolls. The formulas for an NN-sided die can be derived from the formulas for discrete uniform distribution. μX=N+12=8+12=4.5\mu_{ X} ~=~ \frac{N+1}{2} ~=~ \frac{8+1}{2} ~=~ 4.5 σX=N2112=82112=2.2912878\sigma_{ X} ~=~ \sqrt{\frac{N^2-1}{12}} ~=~ \sqrt{\frac{8^2-1}{12}} ~=~ 2.2912878

    Please use a normal approximation based on the central limit theorem.


    Solution

    We calculate the mean and standard deviation of the random sum using the formulas from the central limit theorem. μX=nμX=(64)(4.5)=288\begin{aligned} \mu_{ \sum X} &= n\mu_{ X} \\ &= (64)(4.5) \\ &= 288 \end{aligned}

    σX=σXn=(2.2912878)64=18.3303028\begin{aligned} \sigma_{ \sum X} &= \sigma_{ X} \sqrt{n} \\ &= (2.2912878)\sqrt{64} \\ &= 18.3303028 \end{aligned}

    The central limit theorem tells us that the random sum is approximately normal with the parameters calculated above.

    XN(288,18.3303028)\sum X \sim N(288,\,18.3303028)

    In other words, we can approximate the summing of 64 rolls of 8-sided dice with a single spin of the following spinner.

    plot of chunk unnamed-chunk-1

    Find the appropriate zz score.

    z=(x)μXσX=305.528818.3303028=0.9547033z ~=~ \frac{(\sum x)-\mu_{ \sum X}}{\sigma_{ \sum X}} ~=~ \frac{305.5-288}{18.3303028} ~=~ 0.9547033

    You can round this to either 0.95 or 0.96. I will continue with the unrounded zz score. We now rephrase the question as a standard normal probability.

    P(X<305.5)=P(Z<0.9547033)P\left(\sum X < 305.5\right) ~=~ P(Z<0.9547033) P(Z<0.9547033)=?P(Z<0.9547033) ~=~ \, ?

    You can use the zz table to find the probability.

    P(Z<0.9547033)=0.8301361P(Z<0.9547033) ~=~ 0.8301361

    P(Z<0.95)=0.8289P(Z<0.95) ~=~ 0.8289 P(Z<0.96)=0.8315P(Z<0.96) ~=~ 0.8315

    We can sketch the density curve and shade the appropriate region.

    plot of chunk unnamed-chunk-2


  90. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=83.68\mu_{X}=83.68 and standard deviation of σX=5.73\sigma_{X}=5.73.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 184 times, and the total of the spins will be recorded. Determine the probability that the random total is more than 15400. P(X>15400)=?P\left(\sum{X}>15400\right) ~=~ \,?

    Please approximate the random total as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random total. μX=nμX=(184)(83.68)=15397.12\mu_{\sum{X}} ~=~ n\cdot\mu_{X} ~=~ (184)(83.68) ~=~ 15397.12 σX=σXn=(5.73)184=77.7255016\sigma_{\sum{X}} = \sigma_{X}\sqrt{n} = (5.73)\sqrt{184} = 77.7255016 So, we think the random total is normally distributed with a mean of 15397.12 and a standard deviation of 77.7255016. XN(15397.12,77.7255016)\sum{X} \sim N(15397.12,\,77.7255016)

    Calculate a zz-score for the boundary. z=1540015397.1277.7255016=0.0370535z = \frac{15400-15397.12}{77.7255016} = 0.0370535

    You can round that to z=0.03z=0.03 or z=0.04z=0.04. Any of the following probabilities will get credit. P(Z>0.0370535)=0.4852212P(Z > 0.0370535) = 0.4852212 P(Z>0.03)=0.488P(Z > 0.03) = 0.488 P(Z>0.04)=0.484P(Z > 0.04) = 0.484

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  91. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=61.48\mu_{X}=61.48 and standard deviation of σX=7.31\sigma_{X}=7.31.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 146 times, and the total of the spins will be recorded. Determine the probability that the random total is within 53.92 units from μX\mu_{\sum X}. P(|XμX|<53.92)=?P\left(\left|\sum{X}-\mu_{\sum X}\right|<53.92\right) ~=~ \,?

    Please approximate the random total as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random total. μX=nμX=(146)(61.48)=8976.08\mu_{\sum{X}} ~=~ n\cdot\mu_{X} ~=~ (146)(61.48) ~=~ 8976.08 σX=σXn=(7.31)146=88.3270661\sigma_{\sum{X}} = \sigma_{X}\sqrt{n} = (7.31)\sqrt{146} = 88.3270661 So, we think the random total is normally distributed with a mean of 8976.08 and a standard deviation of 88.3270661. XN(8976.08,88.3270661)\sum{X} \sim N(8976.08,\,88.3270661)

    Calculate a zz-score for the boundary. z=53.9288.3270661=0.6104584z = \frac{53.92}{88.3270661} = 0.6104584

    You can round that to z=0.61z=0.61 or z=0.62z=0.62. Any of the following probabilities will get credit. P(|Z|<0.6104584)=0.4584418P(|Z| < 0.6104584) = 0.4584418 P(|Z|<0.61)=0.4582P(|Z| < 0.61) = 0.4582 P(|Z|<0.62)=0.4648P(|Z| < 0.62) = 0.4648

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  92. Question

    The continuous random variable XX follows the distribution shown by the density curve and spinner below. It has a mean of μX=35.02\mu_{X}=35.02 and standard deviation of σX=3.38\sigma_{X}=3.38.

    plot of chunk unnamed-chunk-1

    plot of chunk unnamed-chunk-2

    That spinner (XX) will be fairly spun 75 times, and the total of the spins will be recorded. Determine the probability that the random total is farther than 33.5 units from μX\mu_{\sum X}. P(|XμX|>33.5)=?P\left(\left|\sum{X}-\mu_{\sum X}\right|>33.5\right) ~=~ \,?

    Please approximate the random total as a normal distribution with parameters suggested by the central limit theorem.


    Solution

    We use the central limit formulas for a random total. μX=nμX=(75)(35.02)=2626.5\mu_{\sum{X}} ~=~ n\cdot\mu_{X} ~=~ (75)(35.02) ~=~ 2626.5 σX=σXn=(3.38)75=29.2716586\sigma_{\sum{X}} = \sigma_{X}\sqrt{n} = (3.38)\sqrt{75} = 29.2716586 So, we think the random total is normally distributed with a mean of 2626.5 and a standard deviation of 29.2716586. XN(2626.5,29.2716586)\sum{X} \sim N(2626.5,\,29.2716586)

    Calculate a zz-score for the boundary. z=33.529.2716586=1.1444517z = \frac{33.5}{29.2716586} = 1.1444517

    You can round that to z=1.14z=1.14 or z=1.15z=1.15. Any of the following probabilities will get credit. P(|Z|>1.1444517)=0.2524364P(|Z| > 1.1444517) = 0.2524364 P(|Z|>1.14)=0.2542P(|Z| > 1.14) = 0.2542 P(|Z|>1.15)=0.2502P(|Z| > 1.15) = 0.2502

    Let’s draw a sketch.

    plot of chunk unnamed-chunk-3


  93. Question

    In some game, each trial has a p=0.15p = 0.15 probability of success. A player will attempt 190 trials. What is the probability that the number of successes is less than 27.5?


    Solution

    We determine the mean and standard deviation of the binomial distribution. μ=np=(190)(0.15)=28.5\mu ~=~ np ~=~ (190)(0.15) ~=~ 28.5 σ=np(1p)=(190)(0.15)(0.85)=4.9218899\sigma ~=~ \sqrt{np(1-p)} ~=~ \sqrt{(190)(0.15)(0.85)} ~=~ 4.9218899

    We determine a zz-score. (In the de Moivre-Laplace notes, I used x\sum x to emphasize that a binomial variable is a sum of Bernoulli trials. Here, I will just use xx as the boundary for number of successes, because this is the more common notation.) z=xμσ=27.528.54.9218899=0.203174\begin{aligned} z &= \frac{x - \mu}{\sigma} \\\\ &= \frac{27.5-28.5}{4.9218899} \\\\ &= -0.203174 \end{aligned}

    Then, find the standard normal probability.

    P(Z<0.203174)=0.4194995P(Z < -0.203174) ~=~ 0.4194995

    Of course, you could have rounded zz, so the following will also get credit. P(Z<0.21)=0.4168P(Z < -0.21) ~=~ 0.4168 P(Z<0.2)=0.4207P(Z < -0.2) ~=~ 0.4207


  94. Question

    In some game, each trial has a p=0.5p = 0.5 probability of success. A player will attempt 90 trials. What is the probability that the number of successes is more than 52.5?


    Solution

    We determine the mean and standard deviation of the binomial distribution. μ=np=(90)(0.5)=45\mu ~=~ np ~=~ (90)(0.5) ~=~ 45 σ=np(1p)=(90)(0.5)(0.5)=4.7434165\sigma ~=~ \sqrt{np(1-p)} ~=~ \sqrt{(90)(0.5)(0.5)} ~=~ 4.7434165

    We determine a zz-score. (In the de Moivre-Laplace notes, I used x\sum x to emphasize that a binomial variable is a sum of Bernoulli trials. Here, I will just use xx as the boundary for number of successes, because this is the more common notation.) z=xμσ=52.5454.7434165=1.5811388\begin{aligned} z &= \frac{x - \mu}{\sigma} \\\\ &= \frac{52.5-45}{4.7434165} \\\\ &= 1.5811388 \end{aligned}

    Then, find the standard normal probability.

    P(Z>1.5811388)=0.0569231P(Z > 1.5811388) ~=~ 0.0569231

    Of course, you could have rounded zz, so the following will also get credit. P(Z>1.58)=0.0571P(Z > 1.58) ~=~ 0.0571 P(Z>1.59)=0.0559P(Z > 1.59) ~=~ 0.0559


  95. Question

    In some game, each trial has a p=0.48p = 0.48 probability of success. A player will attempt 195 trials. What is the probability that the number of successes is between 91.5 and 100.5?


    Solution

    We determine the mean and standard deviation of the binomial distribution. μ=np=(195)(0.48)=93.6\mu ~=~ np ~=~ (195)(0.48) ~=~ 93.6 σ=np(1p)=(195)(0.48)(0.52)=6.9765321\sigma ~=~ \sqrt{np(1-p)} ~=~ \sqrt{(195)(0.48)(0.52)} ~=~ 6.9765321

    We determine both zz-scores. (In the de Moivre-Laplace notes, I used x\sum x to emphasize that a binomial variable is a sum of Bernoulli trials. Here, I will just use xx as the boundary for number of successes, because this is the more common notation.) We get the first zz-score.

    z1=x1μσ=91.593.66.9765321=0.3010092\begin{aligned} z_1 &= \frac{x_1 - \mu}{\sigma} \\\\ &= \frac{91.5-93.6}{6.9765321} \\\\ &= -0.3010092 \end{aligned}

    We get the second zz-score.

    z2=x2μσ=100.593.66.9765321=0.9890301\begin{aligned} z_2 &= \frac{x_2 - \mu}{\sigma} \\\\ &= \frac{100.5-93.6}{6.9765321} \\\\ &= 0.9890301 \end{aligned}

    Then, find the standard normal probability.

    P(0.3010092<Z<0.9890301)=0.456972P(-0.3010092 < Z < 0.9890301) ~=~ 0.456972

    Of course, you could have rounded the zz-scores, so any of the following will also get credit. P(0.31<Z<0.98)=0.4582P(-0.31 < Z < 0.98) ~=~ 0.4582 P(0.3<Z<0.99)=0.4568P(-0.3 < Z < 0.99) ~=~ 0.4568 P(0.31<Z<0.99)=0.4544P(-0.31 < Z < 0.99) ~=~ 0.4544 P(0.3<Z<0.98)=0.4606P(-0.3 < Z < 0.98) ~=~ 0.4606


  96. Question

    In some game, each trial has a p=0.62p = 0.62 probability of success. A player will attempt 98 trials. What is the probability that the proportion of successes is less than 0.6276?


    Solution

    We determine the mean and standard deviation of the proportion sampling distribution. μ=p=0.62\mu ~=~ p ~=~ 0.62 σ=p(1p)n=(0.62)(0.38)n=0.0490314\sigma ~=~ \sqrt{\frac{p(1-p)}{n}} ~=~ \sqrt{\frac{(0.62)(0.38)}{n}} ~=~ 0.0490314

    We determine a zz-score. (The common notation is p̂\hat{p} for a specific proportion.) z=p̂μσ=0.62760.620.0490314=0.1550026\begin{aligned} z &= \frac{\hat{p} - \mu}{\sigma} \\\\ &= \frac{0.6276-0.62}{0.0490314} \\\\ &= 0.1550026 \end{aligned}

    Then, find the standard normal probability.

    P(Z<0.1550026)=0.5615904P(Z < 0.1550026) ~=~ 0.5615904

    Of course, you could have rounded zz, so the following will also get credit. P(Z<0.15)=0.5596P(Z < 0.15) ~=~ 0.5596 P(Z<0.16)=0.5636P(Z < 0.16) ~=~ 0.5636


  97. Question

    In some game, each trial has a p=0.92p = 0.92 probability of success. A player will attempt 135 trials. What is the probability that the proportion of successes is more than 0.9444?


    Solution

    We determine the mean and standard deviation of the proportion sampling distribution. μ=p=0.92\mu ~=~ p ~=~ 0.92 σ=p(1p)n=(0.92)(0.08)n=0.0233492\sigma ~=~ \sqrt{\frac{p(1-p)}{n}} ~=~ \sqrt{\frac{(0.92)(0.08)}{n}} ~=~ 0.0233492

    We determine a zz-score. (The common notation is p̂\hat{p} for a specific proportion.) z=p̂μσ=0.94440.920.0233492=1.0450036\begin{aligned} z &= \frac{\hat{p} - \mu}{\sigma} \\\\ &= \frac{0.9444-0.92}{0.0233492} \\\\ &= 1.0450036 \end{aligned}

    Then, find the standard normal probability.

    P(Z>1.0450036)=0.1480106P(Z > 1.0450036) ~=~ 0.1480106

    Of course, you could have rounded zz, so the following will also get credit. P(Z>1.04)=0.1492P(Z > 1.04) ~=~ 0.1492 P(Z>1.05)=0.1469P(Z > 1.05) ~=~ 0.1469


  98. Question

    In some game, each trial has a p=0.7p = 0.7 probability of success. A player will attempt 191 trials. What is the probability that the proportion of successes is between 0.7199 and 0.7356?


    Solution

    We determine the mean and standard deviation of the proportion sampling distribution. μ=p=0.7\mu ~=~ p ~=~ 0.7 σ=p(1p)n=(0.7)(0.3)191=0.0331584\sigma ~=~ \sqrt{\frac{p(1-p)}{n}} ~=~ \sqrt{\frac{(0.7)(0.3)}{191}} ~=~ 0.0331584

    We determine both zz-scores. We get the first zz-score.

    z1=p̂1μσ=0.71990.70.0331584=0.6000083\begin{aligned} z_1 &= \frac{\hat{p}_1 - \mu}{\sigma} \\\\ &= \frac{0.7199-0.7}{0.0331584} \\\\ &= 0.6000083 \end{aligned}

    We get the second zz-score.

    z2=p̂2μσ=0.73560.70.0331584=1.0736991\begin{aligned} z_2 &= \frac{\hat{p}_2 - \mu}{\sigma} \\\\ &= \frac{0.7356-0.7}{0.0331584} \\\\ &= 1.0736991 \end{aligned}

    Then, find the standard normal probability.

    P(0.6000083<Z<1.0736991)=0.1327716P(0.6000083 < Z < 1.0736991) ~=~ 0.1327716

    Of course, you could have rounded the zz-scores, so any of the following will also get credit. P(0.6<Z<1.07)=0.132P(0.6 < Z < 1.07) ~=~ 0.132 P(0.61<Z<1.08)=0.1308P(0.61 < Z < 1.08) ~=~ 0.1308 P(0.6<Z<1.08)=0.1286P(0.6 < Z < 1.08) ~=~ 0.1286 P(0.61<Z<1.07)=0.1342P(0.61 < Z < 1.07) ~=~ 0.1342


  99. Question

    The following questions use ZZ to refer to the standard normal variable. You will determine some probabilities and some boundaries.


    1. Determine zz such that P(Z<z)=0.0708P(Z<z)=0.0708
    2. Determine P(1.44<Z<1.44)P(-1.44<Z<1.44)
    3. Determine P(Z<0.26)P(Z<0.26)
    4. Determine zz such that P(Z>z)=0.4364P(Z>z)=0.4364
    5. Determine P(Z>0.47)P(Z>0.47)

    Solution

    To do this problem, you should practice the Standard Normal exercises.


    1. -1.47
    2. 0.8502
    3. 0.6026
    4. 0.16
    5. 0.3192

  100. Question

    Let random variable XX be normally distributed with mean μ=30\mu=30 and standard deviation σ=9\sigma=9.


    1. Determine P(X>29.1)P(X>29.1)
    2. Determine xx such that P(X>x)=0.9641P(X>x)=0.9641
    3. Determine P(28.2<X<31.8)P(28.2<X<31.8)
    4. Determine xx such that P(X<x)=0.4207P(X<x)=0.4207
    5. Determine P(X<21.9)P(X<21.9)

    Solution

    1. Identify the boundary.x=29.1x=29.1Get the z score.z=xμσ=0.1z=\frac{x-\mu}{\sigma}=-0.1With zz-table, determine P(Z>0.1)=0.5398P(Z>-0.1)=0.5398
    2. Determine zz such that P(Z>z)=0.0359P(Z>z)=0.0359.z=1.8z=-1.8Calculate the boundary xx.x=μ+zσ=13.8x=\mu+z\sigma=13.8
    3. Identify the boundaries.x1=28.2x_1=28.2x2=31.8x_2=31.8Get the z scores.z1=x1μσ=0.2z_1=\frac{x_1-\mu}{\sigma}=-0.2z2=x2μσ=0.2z_2=\frac{x_2-\mu}{\sigma}=0.2With zz-table, determine P(0.2<Z<0.2)=0.1586P(-0.2<Z<0.2)=0.1586
    4. Determine zz such that P(Z<z)=0.4207P(Z<z)=0.4207.z=0.2z=-0.2Calculate the boundary xx.x=μ+zσ=28.2x=\mu+z\sigma=28.2
    5. Identify the boundary.x=21.9x=21.9Get the z score.z=xμσ=0.9z=\frac{x-\mu}{\sigma}=-0.9With zz-table, determine P(Z<0.9)=0.1841P(Z<-0.9)=0.1841

  101. Question

    Population

    Random variable XX has mean μ=55\mu=55 and standard deviation σ=4\sigma=4.

    Interval of typical measurements

    Let the interval of typical measurements be defined as having lower bound μ2σ\mu-2\sigma and upper bound μ+2σ\mu+2\sigma. interval of typical measurements=(μ2σ,μ+2σ)\text{interval of typical measurements} = (\mu-2\sigma, \, \mu+2\sigma)

    Calculate the lower bound of the interval of typical measurements.

    Calculate the upper bound of the interval of typical measurements.

    Interval of typical totals

    Let the interval of typical totals be defined as having lower bound nμ2σnn\mu-2\sigma\sqrt{n} and upper bound nμ+2σnn\mu+2\sigma\sqrt{n}. interval of typical totals=(nμ2σn,nμ+2σn)\text{interval of typical totals} = \left(n\mu-2\sigma\sqrt{n}, \, n\mu+2\sigma\sqrt{n}\right)

    For the listed values of nn, determine the bounds.

    nn _ lower bound of typical totals _ _ upper bound of typical totals _
    25
    100
    400

    Interval of typical averages

    Let the interval of typical averages be defined as having lower bound μ2σn\mu-\frac{2\sigma}{\sqrt{n}} and upper bound μ+2σn\mu+\frac{2\sigma}{\sqrt{n}}.

    interval of typical averages=(μ2σn,μ+2σn)\text{interval of typical averages} = \left(\mu-\frac{2\sigma}{\sqrt{n}}, \, \mu+\frac{2\sigma}{\sqrt{n}}\right) For the listed values of nn, determine the bounds.

    nn _ lower bound of typical averages _ _ upper bound of typical averages _
    25
    100
    400


    Solution

    Interval of typical measurements

    lower bound=47\text{lower bound} = 47 upper bound=63\text{upper bound} = 63

    Interval of typical totals

    nn _ lower bound of typical totals _ _ upper bound of typical totals _
    25 1335 1415
    100 5420 5580
    400 21840 22160

    Interval of typical averages

    nn _ lower bound of typical averages _ _ upper bound of typical averages _
    25 53.4 56.6
    100 54.2 55.8
    400 54.6 55.4


  102. Question

    Population

    Random variable XX follows a Bernoulli distribution with p=0.4p=0.4. In this context, each spin is often called a trial. A “0” is a “fail” and a “1” is a “success”.

    plot of chunk unnamed-chunk-1

    The average of a Bernoulli random variable is equal to pp. μ=p\mu=p Determine μ\mu.

    The standard deviation of a Bernoulli random variable is equal to p(1p)\sqrt{p(1-p)}. σ=p(1p)\sigma = \sqrt{p(1-p)} Determine σ\sigma. (Round to thousandths place.)

    Interval of typical success totals

    The binomial distribution predicts how many successes there will be for a given number of trials. Let the interval of typical successes be defined as having lower bound np2np(1p)np-2\sqrt{np(1-p)} and upper bound np+2np(1p)np+2\sqrt{np(1-p)}. interval of typical successes=(np2np(1p),np+2np(1p))\text{interval of typical successes} = \left(np-2\sqrt{np(1-p)}, \, np+2\sqrt{np(1-p)}\right)

    For the listed values of nn, determine the bounds. (Round to nearest integer.)

    nn _ lower bound of typical successes _ _ upper bound of typical successes _
    25
    100
    400

    Interval of typical proportions

    Let the interval of typical proportions be defined as having lower bound p2p(1p)np-2\sqrt{\frac{p(1-p)}{n}} and upper bound p+2p(1p)np+2\sqrt{\frac{p(1-p)}{n}}.

    interval of typical proportions=(p2p(1p)n,p+2p(1p)n)\text{interval of typical proportions} = \left(p-2\sqrt{\frac{p(1-p)}{n}}, \, p+2\sqrt{\frac{p(1-p)}{n}}\right) For the listed values of nn, determine the bounds. (Round to nearest hundredth.)

    nn _ lower bound of typical proportions _ _ upper bound of typical proportions _
    25
    100
    400


    Solution

    Bernoulli

    μ=0.4\mu = 0.4 σ=0.4898979\sigma = 0.4898979

    Interval of typical successes

    nn _ lower bound of typical successes _ _ upper bound of typical successes _
    25 5 15
    100 30 50
    400 140 180

    Interval of typical proportions

    nn _ lower bound of typical proportions _ _ upper bound of typical proportions _
    25 0.2 0.6
    100 0.3 0.5
    400 0.35 0.45


  103. Question

    A farm produces 4 types of fruit: mangos, bananas, oranges, and plums. The fruits’ masses have population parameters dependent on the type of fruit. (All values are in grams.)

    _ Type of fruit _ _ Population mean (μ\mu) _ _ Population standard deviation (σ\sigma) _
    mangos 219 19
    bananas 153 10
    oranges 191 24
    plums 64 6

    A sample of each type is weighed. The results are shown below.

    _ Type of fruit _ _ Sample size (nn) _ _ Sample mean (x\bar{x}) _
    mango 81 221.6
    banana 121 152.3
    orange 100 188.6
    plum 144 64.73

    The population parameters and sample statistics can be downloaded as a csv.

    For each sample, determine the mean’s standard score and the mean’s cumulative probability by assuming the requirements for the central limit theorem are met. Then determine which sample mean is most unusually large, most unusually small, most typically sized, and most unusually sized.

    The standard score of a sample mean: z=xμσ/nz = \frac{\bar{x}-\mu}{\sigma/\sqrt{n}} Some people prefer to call this denominator the standard error. standard error of mean=SEM=σn\text{standard error of mean} = \text{SEM} = \frac{\sigma}{\sqrt{n}} So the standard score (zz-score) of a sample mean can also be expressed as z=xμSEMz = \frac{\bar{x}-\mu}{\text{SEM}}


    1. Calculate a zz-score for the mango sample mean. (Round to the nearest hundredth.)
    2. What proportion of mango samples of size n=81n=81 have a smaller mean than the mango sample?
    3. Calculate a zz-score for the banana sample mean. (Round to the nearest hundredth.)
    4. What proportion of banana samples of size n=121n=121 have a smaller mean than the banana sample?
    5. Calculate a zz-score for the orange sample mean. (Round to the nearest hundredth.)
    6. What proportion of orange samples of size n=100n=100 have a smaller mean than the orange sample?
    7. Calculate a zz-score for the plum sample mean. (Round to the nearest hundredth.)
    8. What proportion of plum samples of size n=144n=144 have a smaller mean than the plum sample?
    9. The mango sample mean is most unusually large. / The banana sample mean is most unusually large. / The orange sample mean is most unusually large. / The plum sample mean is most unusually large.
    10. The mango sample mean is most unusually small. / The banana sample mean is most unusually small. / The orange sample mean is most unusually small. / The plum sample mean is most unusually small.
    11. The mango sample mean is most typically sized. / The banana sample mean is most typically sized. / The orange sample mean is most typically sized. / The plum sample mean is most typically sized.
    12. The mango sample mean is most unusually sized. / The banana sample mean is most unusually sized. / The orange sample mean is most unusually sized. / The plum sample mean is most unusually sized.

    Solution

    The formula to determine the zz-score of a sample mean is the ratio with numerator the difference between measurement and population mean and denominator the standard error of the mean.

    z=xμσ/nz = \frac{\bar{x}-\mu}{\sigma/\sqrt{n}}

    The highest zz-score (furthest right on number line) corresponds to the most unusually large sample mean.

    The smallest zz-score (furthest left on number line) corresponds to the most unusually small sample mean.

    The smallest absolute zz-score corresponds to the most usually sized sample mean.

    The largest absolute zz-score corresponds to the most unusually sized sample mean.

    Spreadsheet:

    By using the formulas, you should get a spreadsheet like the one displayed here:

    plot of chunk unnamed-chunk-1

    To do this in R:

    data = read.csv("fruit.csv", as.is=TRUE)
    fruit = data$fruit
    mu = data$mu
    sigma = data$sigma
    n = data$n
    xbar = data$xbar
    
    SEM = sigma/sqrt(n)
    z = round( (xbar-mu)/SEM ,2)
    cum_prob = round( pnorm(z) ,4)
    data.frame(data,SEM,z,cum_prob,abs(z))
    ##    fruit  mu sigma   n   xbar       SEM     z cum_prob abs.z.
    ## 1  mango 219    19  81 221.60 2.1111111  1.23   0.8907   1.23
    ## 2 banana 153    10 121 152.30 0.9090909 -0.77   0.2206   0.77
    ## 3 orange 191    24 100 188.60 2.4000000 -1.00   0.1587   1.00
    ## 4   plum  64     6 144  64.73 0.5000000  1.46   0.9279   1.46
    # Most unusually large
    fruit[z==max(z)]
    ## [1] "plum"
    # Most unusually small
    fruit[z==min(z)]
    ## [1] "orange"
    # Most typically sized
    fruit[abs(z)==min(abs(z))]
    ## [1] "banana"
    # Most unusually sized
    fruit[abs(z)==max(abs(z))]
    ## [1] "plum"

    1. We use the formula for calculating the mango sample-mean zz score. z=221.621919/81=1.23z = \frac{221.6-219}{19/\sqrt{81}} = 1.23
    2. Use the zz table to find the cumulative probability. P(Z<1.23)=0.8907P(Z<1.23) = 0.8907
    3. We use the formula for calculating the banana sample-mean zz score. z=152.315310/121=0.77z = \frac{152.3-153}{10/\sqrt{121}} = -0.77
    4. Use the zz table to find the cumulative probability. P(Z<0.77)=0.2206P(Z<-0.77) = 0.2206
    5. We use the formula for calculating the orange sample-mean zz score. z=188.619124/100=1z = \frac{188.6-191}{24/\sqrt{100}} = -1
    6. Use the zz table to find the cumulative probability. P(Z<1)=0.1587P(Z<-1) = 0.1587
    7. We use the formula for calculating the plum sample-mean zz score. z=64.73646/144=1.46z = \frac{64.73-64}{6/\sqrt{144}} = 1.46
    8. Use the zz table to find the cumulative probability. P(Z<1.46)=0.9279P(Z<1.46) = 0.9279
    9. We determine the maximum zz score is 1.461.46, which belongs to the plum. / . / . / .
    10. We determine the minimum zz score is 1-1, which belongs to the orange. / . / . / .
    11. We determine the minimum absolute zz score is 0.770.77, which belongs to the banana. / . / . / .
    12. We determine the maximum absolute zz score is 1.461.46, which belongs to the plum. / . / . / .

  104. Question

    This question provides μ\mu, σ\sigma, nn, and x\sum x. You will characterize the sampling distribution, calculate the standard score, determine the percentile rank (expressed as a decimal), and translate the score into a rating (see below).

    Population

    When Archie practices archery, each arrow has the same probability distribution (see i.i.d). This means her skill is constant, there is no hot-hand effect, and there is no maturity of chances.

    Over many months, Archie has shot ten thousand arrows (their positions are shown below as dots).

    plot of chunk unnamed-chunk-1

    From those many arrows, Archie has determined an accurate probability distribution (of the points scored by an arrow).

    plot of chunk unnamed-chunk-2

    Thus, Archie can determine her population mean and population standard deviation. μ=9.1\mu=9.1 σ=1\sigma=1

    Sampling distribution

    Each day, Archie shoots 48 arrows (n=48n=48) and determines that day’s total score. A total of 480 would be a perfect day. We can treat each day’s total as a random variable with its own probability distribution: a sampling distribution. We wish to characterize this sampling distribution. From the central limit theorem, we know the sampling distribution is approximately normal; however, we need to calculate the parameters.

    Expected total (mean of sampling distribution)

    The expected total is the product of the sample size and the population mean. expected total=nμ\text{expected total} = n\mu

    “Expected total” is a misnomer. It is not necessarily likely, or even possible, for a total to equal the expected total. However, if Archie repeatedly shot 48 arrows, we expect the totals to have a mean equal to the expected total. So, maybe “average of totals” would be better terminology.

    This expected total is the average of the sampling distribution.

    Calculate the expected total:

    (Round to the hundredths place)

    Standard error (standard deviation of sampling distribution)

    The standard error of a total is the product of the population standard deviation and the square root of the sample size. standard error of total=σn\text{standard error of total} = \sigma \sqrt{n} This standard error is the standard deviation of the sampling distribution.

    Calculate the standard error:

    (Round to the hundredths place)

    Sample

    Today, Archie’s total score is 442 points (x=442\sum x = 442).

    plot of chunk unnamed-chunk-3

    Standard score

    Archie would like to know how well she did today. She wants you to calculate a standard score (a zz score). To calculate the standard score of a sample total, you can use the following formula. z=xnμσnz = \frac{\sum x - n\mu}{\sigma\sqrt{n}}

    Calculate the standard score:

    (Round to the hundredths place)

    Cumulative probability (percentile rank)

    Archie would like to know the probability that tomorrow she shoots worse than today. To estimate this, report the cumulative probability associated with the standard score you calculated. This can be done with a zz table, pnorm function in R, NORM.DIST function in a spreadsheet, or with other standard normal tools.

    Calculate P(Zz)P(Z\leq z), the cumulative probability:

    (Round to the ten-thousandths place)

    Rating

    Archie would like a rating for her day’s performance. You decide to use the following scale and that if zz is on a boundary, Archie will be given the higher rating.

    zz interval rating
    -\infty to -1.5 F
    -1.5 to -0.5 D
    -0.5 to 0.5 C
    0.5 to 1.5 B
    1.5 to \infty A

    plot of chunk unnamed-chunk-4

    Determine Archie’s rating: A / B / C / D / F



    Solution

    The sampling distribution is approximately normal with mean 436.8 and standard error 6.93. The sampling distribution can be visualized, along with today’s total (442) highlighted in red. The cumulative probability is the sum of the probabilities of scores less than (or equal) 442. Thus, the area highlighted in blue represents the cumulative probability.

    plot of chunk unnamed-chunk-5


    1. The expected total is nμ=436.8n\mu = 436.8
    2. The standard error is σn=6.93\sigma \sqrt{n} = 6.93
    3. Use the formula to calculate z=0.75z=0.75
    4. Use the zz table to find P(Zz)=0.7734P(Z\le z) = 0.7734
    5. The proper rating is B / The proper rating is B / The proper rating is B / The proper rating is B / The proper rating is B

  105. Question

    Random variable XX is normally distributed with mean μ=65\mu=65 and standard deviation σ=9\sigma=9. XN(65,9)X \sim N(65, 9)


    1. If n=100n=100, determine P(X<6544)P\left(\sum X <6544\right)
    2. If n=25n=25, determine P(X>1594)P\left(\sum X >1594\right)
    3. If n=144n=144, determine P(X¯<63.87)P\left(\overline{X}<63.87\right)
    4. Determine P(X>66.9)P(X>66.9)
    5. If n=121n=121, determine P(X¯>63.53)P\left(\overline{X}>63.53\right)
    6. Determine P(X<74.5)P(X<74.5)

    Solution

    1. Identify the boundary.x=6544\sum x=6544Calculate the standard error (of total)SE=σn=90\text{SE} = \sigma\sqrt{n}=90Get the z score.z=xnμSE=0.49z=\frac{\sum x-n\mu}{\text{SE}}=0.49With zz-table, determine P(Z<0.49)=0.6879P(Z<0.49)=0.6879
    2. Identify the boundary.x=1594\sum x=1594Calculate the standard error (of total)SE=σn=45\text{SE} = \sigma\sqrt{n}=45Get the z score.z=xnμSE=0.69z=\frac{\sum x-n\mu}{\text{SE}}=-0.69With zz-table, determine P(Z>0.69)=0.7549P(Z>-0.69)=0.7549
    3. Identify the boundary.x=63.87\bar{x}=63.87Calculate the standard error (of mean)SE=σn=0.75\text{SE} = \frac{\sigma}{\sqrt{n}}=0.75Get the z score.z=xμSE=1.51z=\frac{\bar{x}-\mu}{\text{SE}}=-1.51With zz-table, determine P(Z<1.51)=0.0655P(Z<-1.51)=0.0655
    4. Identify the boundary.x=66.9x=66.9Get the z score.z=xμσ=0.21z=\frac{x-\mu}{\sigma}=0.21With zz-table, determine P(Z>0.21)=0.4168P(Z>0.21)=0.4168
    5. Identify the boundary.x=63.53\bar{x}=63.53Calculate the standard error (of mean)SE=σn=0.81818\text{SE} = \frac{\sigma}{\sqrt{n}}=0.81818Get the z score.z=xμSE=1.8z=\frac{\bar{x}-\mu}{\text{SE}}=-1.8With zz-table, determine P(Z>1.8)=0.9641P(Z>-1.8)=0.9641
    6. Identify the boundary.x=74.5x=74.5Get the z score.z=xμσ=1.06z=\frac{x-\mu}{\sigma}=1.06With zz-table, determine P(Z<1.06)=0.8554P(Z<1.06)=0.8554

  106. Question

    In some game, each trial has chance p=0.43p=0.43 of success. In other words, XX is a random Bernoulli variable with parameter p=0.43p=0.43. To determine the following probabilities, use the de Moivre–Laplace theorem (normal approximation). I have already done the continuity correction by setting the boundaries.


    1. If n=36n=36, what is the probability that the sample proportion (p̂\hat{p}) is more than 0.4583? In other words, if n=36n=36, determine P(X¯>0.4583)P\left(\overline{X}>0.4583\right).In other words, if n=36n=36, determine P(p̂>0.4583)P\left(\hat{p}>0.4583\right).
    2. If n=81n=81, what is the probability that the sample proportion (p̂\hat{p}) is less than 0.3889? In other words, if n=81n=81, determine P(X¯<0.3889)P\left(\overline{X}<0.3889\right).In other words, if n=81n=81, determine P(p̂<0.3889)P\left(\hat{p}<0.3889\right).
    3. If n=49n=49, what is the chance the number of successes is less than 15.5? In other words, if n=49n=49, determine P(X<15.5)P\left(\sum X <15.5\right)
    4. If n=64n=64, what is the chance the number of successes is more than 23.5? In other words, if n=64n=64, determine P(X>23.5)P\left(\sum X >23.5\right)

    Solution

    You could first determine the mean and standard deviation of the Bernoulli variable XX.

    μ=p=0.43\mu = p = 0.43 σ=p(1p)=0.495076\sigma = \sqrt{p(1-p)} = 0.495076


    1. Identify the boundary.p̂=0.4583\hat{p}=0.4583Calculate the standard error (of proportion sampling)SE=σn=p(1p)n=0.082513\text{SE} = \frac{\sigma}{\sqrt{n}}=\sqrt{\frac{p(1-p)}{n}} =0.082513Get the z score.z=p̂μSE=0.34z=\frac{\hat{p}-\mu}{\text{SE}}=0.34With zz-table, determine P(Z>0.34)=0.3669P(Z>0.34)=0.3669
    2. Identify the boundary.p̂=0.3889\hat{p}=0.3889Calculate the standard error (of proportion sampling)SE=σn=p(1p)n=0.055008\text{SE} = \frac{\sigma}{\sqrt{n}}= \sqrt{\frac{p(1-p)}{n}} = 0.055008Get the z score.z=p̂μSE=0.75z=\frac{\hat{p}-\mu}{\text{SE}}=-0.75With zz-table, determine P(Z<0.75)=0.2266P(Z<-0.75)=0.2266
    3. Identify the boundary.x=15.5\sum x=15.5Calculate the standard error (of total)SE=σn=np(1p)=3.4655\text{SE} = \sigma\sqrt{n}=\sqrt{np(1-p)} =3.4655Get the z score.z=xnμSE=1.61z=\frac{\sum x-n\mu}{\text{SE}}=-1.61With zz-table, determine P(Z<1.61)=0.0537P(Z<-1.61)=0.0537
    4. Identify the boundary.x=23.5\sum x=23.5Calculate the standard error (of total)SE=σn=np(1p)=3.9606\text{SE} = \sigma\sqrt{n}=\sqrt{np(1-p)} =3.9606Get the z score.z=xnμSE=1.01z=\frac{\sum x-n\mu}{\text{SE}}=-1.01With zz-table, determine P(Z>1.01)=0.8438P(Z>-1.01)=0.8438

  107. Question


    1. If n=33n=33, determine P(T<1.69)P(T<1.69).
    2. If n=3n=3, determine P(T>6.96)P(T>6.96).
    3. If n=32n=32, determine P(|T|<2.74)P(|T|<2.74).
    4. If n=19n=19, determine P(|T|>2.1)P(|T|>2.1).
    5. If n=9n=9, determine tt such that P(T<t)=0.995P(T<t) = 0.995.
    6. If n=24n=24, determine tt such that P(T>t)=0.01P(T>t) = 0.01.
    7. If n=15n=15, determine tt such that P(|T|<t)=0.8P(|T|<t) = 0.8.
    8. If n=32n=32, determine tt such that P(|T|>t)=0.04P(|T|>t) = 0.04.

    Solution

    You can use a tt table. Remember, the degree of freedom (df or ν\nu) is one less than nn. df=n1\text{df} = n-1

    Spreadsheet

    You can use T.DIST and T.INV to calculate the values. But remember these functions return/use LEFT-area probabilities.

    R

    You can use pt and qt to calculate the values. But remember these functions return/use LEFT-area probabilities.


    1. When n=33n=33, then df=32\text{df}=32 and P(T<1.69)=0.95P(T<1.69)=0.95.
    2. When n=3n=3, then df=2\text{df}=2 and P(T>6.96)=0.01P(T>6.96)=0.01.
    3. When n=32n=32, then df=31\text{df}=31 and P(|T|<2.74)=0.99P(|T|<2.74)=0.99.
    4. When n=19n=19, then df=18\text{df}=18 and P(|T|>2.1)=0.05P(|T|>2.1)=0.05.
    5. When n=9n=9, then df=8\text{df}=8 and P(T<3.36)=0.995P(T<3.36)=0.995, so t=3.36t=3.36.
    6. When n=24n=24, then df=23\text{df}=23 and P(T>2.5)=0.01P(T>2.5)=0.01, so t=2.5t=2.5.
    7. When n=15n=15, then df=14\text{df}=14 and P(|T|<1.35)=0.8P(|T|<1.35)=0.8, so t=1.35t=1.35.
    8. When n=32n=32, then df=31\text{df}=31 and P(|T|>2.14)=0.04P(|T|>2.14)=0.04, so t=2.14t=2.14.

  108. Question

    A scientist has weighed 49 specimens of a newly discovered organism. Those weights have a sample mean of x=19.9\bar{x} =19.9 grams and a sample standard deviation of s=6.5s=6.5 grams. The scientist hopes to construct a 95% confidence interval of the organism’s population mean (μ\mu).

    The scientist will later consult a statistician for a more precise method, but for now she will use a quick method to estimate the 95% confidence interval: x±2sn\bar{x}\pm \frac{2s}{\sqrt{n}}

    (You can round to nearest 0.1 grams.)


    1. Determine the lower boundary of the confidence interval by using x2sn\bar{x}-\frac{2s}{\sqrt{n}}.
    2. Determine the upper boundary of the confidence interval by using x+2sn\bar{x}+\frac{2s}{\sqrt{n}}.

    Solution

    Plug the numbers into the expressions.


    1. Lower bound = 19.92(6.5)/49=18.0428571~19.9-2(6.5)/\sqrt{49} ~=~ 18.0428571
    2. Upper bound = 19.9+2(6.5)/49=21.7571429~19.9+2(6.5)/\sqrt{49} ~=~ 21.7571429

  109. Question

    A scientist has grown 170 specimens under novel conditions and found that 12.94% of them survived (in other words p̂=0.1294\hat{p}=0.1294). The scientist hopes to construct a 95% confidence interval of the survival rate.

    The scientist will later consult a statistician for a more precise method, but for now she will use a quick method to estimate the 95% confidence interval: p̂±2p̂(1p̂)n\hat{p}\pm 2\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

    (You can round answers to the nearest thousandth.)


    1. Determine the lower boundary of the confidence interval by using p̂2p̂(1p̂)n\hat{p}- 2\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.
    2. Determine the upper boundary of the confidence interval by using p̂+2p̂(1p̂)n\hat{p}+ 2\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.

    Solution

    Plug the numbers into the expressions.


    1. Lower bound = 0.12942(0.1294)(10.1294)170=0.0779148~0.1294-2\sqrt{\frac{(0.1294)(1-0.1294)}{170}} ~=~ 0.0779148
    2. Upper bound = 0.1294+2(0.1294)(10.1294)170=0.1808852~0.1294+2\sqrt{\frac{(0.1294)(1-0.1294)}{170}} ~=~ 0.1808852

  110. Question

    In n=236n=236 trials, there were ns=21n_\text{s}=21 successes, so the sample proportion is p̂=21236=0.089\hat{p}=\frac{21}{236}=0.089. You are tasked with determining the confidence interval (of the population proportion) with confidence level γ=0.82\gamma=0.82.

    To do this, you first determine zz^\star such that P(|Z|<z)=0.82P(|Z|<z^\star)=0.82. Then, evaluate p̂±zp̂(1p̂)n\hat{p}\pm z^\star\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} to determine the boundaries of the confidence interval.

    (You can round answers to the hundredths place.)


    1. Determine the lower boundary of the confidence interval by using p̂zp̂(1p̂)n\hat{p}- z^\star\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.
    2. Determine the upper boundary of the confidence interval by using p̂+zp̂(1p̂)n\hat{p}+ z^\star\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.

    Solution

    First, determine zz^\star such that P(|Z|<z)=0.82P(|Z|<z^\star)=0.82. It can help to draw a sketch.

    plot of chunk middle

    The entire area under the density curve is 1. Thus, we can determine the areas of the tails (using symmetry). area of each tail=1γ2=10.822=0.09\text{area of each tail} = \frac{1-\gamma}{2} = \frac{1-0.82}{2} = 0.09

    plot of chunk allthree

    Determine a leftward area with boundary zz^\star. P(Z<z)=γ+1γ2=0.82+0.09=0.91P(Z<z^\star)=\gamma + \frac{1-\gamma}{2}= 0.82+0.09=0.91

    plot of chunk left

    We rephrase the puzzle. We wish to determine zz^\star such that P(Z<z)=0.91P(Z<z^\star)=0.91. This is easy to determine using a zz-table, qnorm in R, norm.inv in spreadsheets, or other methods for evaluating the quantile function.

    ## Using R's qnorm:
    qnorm(0.91)
    ## [1] 1.340755

    P(Z<1.34)=0.91P(Z<1.34)=0.91 In the original phrasing,

    P(|Z|<1.34)=0.82P(|Z|<1.34)=0.82

    So, z=1.34z^\star = 1.34. Now, use the given expressions to evaluate the boundaries.


    1. Lower bound = 0.0891.34(0.089)(10.089)236=0.06~0.089-1.34\sqrt{\frac{(0.089)(1-0.089)}{236}} ~=~ 0.06
    2. Upper bound = 0.089+1.34(0.089)(10.089)236=0.11~0.089+1.34\sqrt{\frac{(0.089)(1-0.089)}{236}} ~=~ 0.11

  111. Question

    A population’s mean (μ\mu) is unknown, but its standard deviation is known: σ=28.3\sigma=28.3. A sample of size n=216n=216 is taken, and the sample mean is calculated: x=89.5\bar{x}=89.5. You are tasked with determining a confidence interval using a given confidence level: γ=0.85\gamma=0.85.

    To do this, you need to first determine zz^\star such that P(|Z|<z)=0.85P(|Z|<z^\star)=0.85. Then, the boundaries are determined by evaluating x±zσn\bar{x}\pm \frac{z^\star \sigma}{\sqrt{n}}.

    (You can round zz^\star to the nearest hundredth and the boundaries to the nearest tenth.)


    1. Determine the lower boundary of the confidence interval by using xzσn\bar{x}-\frac{z^\star \sigma}{\sqrt{n}}.
    2. Determine the upper boundary of the confidence interval by using x+zσn\bar{x}+\frac{z^\star \sigma}{\sqrt{n}}.

    Solution

    First, determine zz^\star such that P(|Z|<z)=0.85P(|Z|<z^\star)=0.85. It can help to draw a sketch.

    plot of chunk middle

    The entire area under the density curve is 1. Thus, we can determine the areas of the tails (using symmetry). area of each tail=1γ2=10.852=0.075\text{area of each tail} = \frac{1-\gamma}{2} = \frac{1-0.85}{2} = 0.075

    plot of chunk allthree

    Determine a leftward area with boundary zz^\star. P(Z<z)=γ+1γ2=0.85+0.075=0.925P(Z<z^\star)=\gamma + \frac{1-\gamma}{2}= 0.85+0.075=0.925 It should be mentioned the expression can be simplified. P(Z<z)=12+γ2=0.5+0.425=0.925P(Z<z^\star)= \frac{1}{2}+\frac{\gamma}{2}= 0.5+0.425=0.925

    plot of chunk left

    We rephrase the puzzle. We wish to determine zz^\star such that P(Z<z)=0.925P(Z<z^\star)=0.925. This is easy to determine using a zz-table, qnorm in R, norm.inv in spreadsheets, or other methods for evaluating the quantile function.

    P(Z<1.44)=0.925P(Z<1.44)=0.925 In the original phrasing,

    P(|Z|<1.44)=0.85P(|Z|<1.44)=0.85

    So, z=1.44z^\star = 1.44. Now, use the given expressions.

    Lower bound=89.5(1.44)(28.3)/216=86.7\text{Lower bound} = 89.5-(1.44)(28.3)/\sqrt{216} ~=~ 86.7 Upper bound=89.5+(1.44)(28.3)/216=92.3\text{Upper bound} = 89.5+(1.44)(28.3)/\sqrt{216} ~=~ 92.3

    Spreadsheet

    A spreadsheet can use the confidence.norm function. It takes three arguments:

    The function returns the margin of error. margin of error=zσn\text{margin of error} = \frac{z^\star \sigma}{\sqrt{n}}

    So, in this case, if in a spreadsheet you typed =confidence.norm(1-0.85,28.3,216), the result would be the margin of error, 2.7719202. You then add/subtract the margin of error to/from the sample mean. xME=89.52.772=86.728\bar{x}-\text{ME} = 89.5-2.772 = 86.728 x+ME=89.5+2.772=92.272\bar{x}+\text{ME} = 89.5+2.772 = 92.272

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-2

    R

    xbar = 89.5
    sigma = 28.3
    n = 216
    gamma = 0.85
    zstar = qnorm(gamma+(1-gamma)/2)
    ME = zstar*sigma/sqrt(n)
    LB = xbar-ME
    UB = xbar+ME
    cat(sprintf("The lower bound: %.4f\nThe upper bound: %.4f",LB,UB))
    ## The lower bound: 86.7281
    ## The upper bound: 92.2719

    1. Lower bound = 89.5(1.44)(28.3)/216=86.7~89.5-(1.44)(28.3)/\sqrt{216} ~=~ 86.7
    2. Upper bound = 89.5+(1.44)(28.3)/216=92.3~89.5+(1.44)(28.3)/\sqrt{216} ~=~ 92.3

  112. Question

    A population’s mean (μ\mu) and standard deviation (σ\sigma) are unknown, but the population is approximately normal. A sample of size n=15n=15 is taken, and the sample mean is calculated: x=28.6\bar{x}=28.6. The sample standard deviation is also calculated: s=7.9s=7.9. You are tasked with determining a confidence interval using a given confidence level: γ=0.96\gamma=0.96.

    To do this, you need to first determine tt^\star such that P(|T|<t)=0.96P(|T|<t^\star)=0.96. Then, the boundaries are determined by evaluating x±tsn\bar{x}\pm \frac{t^\star s}{\sqrt{n}}.

    (You can round tt^\star to the nearest hundredth and the boundaries to the nearest tenth.)


    1. Determine the lower boundary of the confidence interval by using xtsn\bar{x}-\frac{t^\star s}{\sqrt{n}}.
    2. Determine the upper boundary of the confidence interval by using x+tsn\bar{x}+\frac{t^\star s}{\sqrt{n}}.

    Solution

    First, determine tt^\star such that P(|T|<t)=0.96P(|T|<t^\star)=0.96. It can help to draw a sketch.

    plot of chunk middle

    The entire area under the density curve is 1. Thus, we can determine the areas of the tails (using symmetry). area of each tail=1γ2=10.962=0.02\text{area of each tail} = \frac{1-\gamma}{2} = \frac{1-0.96}{2} = 0.02

    plot of chunk allthree

    Thus, we determine a left area. P(T<t)=γ+1γ2=0.96+0.02=0.98P(T<t^\star)=\gamma + \frac{1-\gamma}{2}= 0.96+0.02=0.98 It should be mentioned the expression can be simplified. P(T<t)=12+γ2=0.5+0.48=0.98P(T<t^\star)= \frac{1}{2}+\frac{\gamma}{2}= 0.5+0.48=0.98

    plot of chunk left

    We rephrase the puzzle. We wish to determine tt^\star such that P(T<t)=0.98P(T<t^\star)=0.98. This is easy to determine using a tt-table, qt in R, t.inv in spreadsheets, or other methods for evaluating the quantile function.

    P(T<2.26)=0.98P(T<2.26)=0.98 In the original phrasing,

    P(|T|<2.26)=0.96P(|T|<2.26)=0.96

    So, t=2.26t^\star = 2.26. Now, use the given expressions.

    Lower bound=28.6(2.26)(7.9)/15=24\text{Lower bound} = 28.6-(2.26)(7.9)/\sqrt{15} ~=~ 24 Upper bound=28.6+(2.26)(7.9)/15=33.2\text{Upper bound} = 28.6+(2.26)(7.9)/\sqrt{15} ~=~ 33.2

    Spreadsheet

    A spreadsheet can use the confidence.t function. It takes three arguments:

    The function returns the margin of error. margin of error=tsn\text{margin of error} = \frac{t^\star s}{\sqrt{n}}

    So, in this case, if in a spreadsheet you typed =confidence.t(1-0.96,7.9,15), the result would be the margin of error, 4.6175959. You then add/subtract the margin of error to/from the sample mean. xME=28.64.618=23.982\bar{x}-\text{ME} = 28.6-4.618 = 23.982 x+ME=28.6+4.618=33.218\bar{x}+\text{ME} = 28.6+4.618 = 33.218

    You can do this with a spreadsheet.

    plot of chunk unnamed-chunk-2

    R

    xbar = 28.6
    s = 7.9
    n = 15
    gamma = 0.96
    df = n-1   # df is the degrees of freedom
    tstar = qt(0.5+gamma/2, df)
    ME = tstar*s/sqrt(n)
    LB = xbar-ME
    UB = xbar+ME
    cat(sprintf("The lower bound: %.4f\nThe upper bound: %.4f",LB,UB))
    ## The lower bound: 23.9824
    ## The upper bound: 33.2176

    1. Lower bound = 28.6(2.26)(7.9)/15=24~28.6-(2.26)(7.9)/\sqrt{15} ~=~ 24
    2. Upper bound = 28.6+(2.26)(7.9)/15=33.2~28.6+(2.26)(7.9)/\sqrt{15} ~=~ 33.2

  113. Question

    A sample was taken from a population. The measurements are shown below and can be downloaded as a csv.

    563, 561, 666, 615, 622, 590, 527, 614, 612, 577, 625, 557, 606, 601, 554, 603, 628, 569, 578, 650, 568, 631

    You are tasked with determining the confidence interval (of the population mean) with the confidence level of γ=0.96\gamma=0.96.


    1. Determine nn, the sample size.
    2. Determine x\bar{x}, the sample mean.
    3. Determine ss, the sample standard deviation (with Bessel correction).
    4. Calculate sn\frac{s}{\sqrt{n}}, the estimated standard error.
    5. Determine tt^\star such that P(|T|<t)=0.96P(|T|<t^\star)=0.96. Remember, df=n1\text{df}=n-1.
    6. Calculate tsn\frac{t^\star s}{\sqrt{n}}, the estimated margin of error.
    7. Calculate xtsn\bar{x}-\frac{t^\star s}{\sqrt{n}}, the lower boundary of the 96% confidence interval.
    8. Calculate x+tsn\bar{x}+\frac{t^\star s}{\sqrt{n}}, the upper boundary of the 96% confidence interval.

    Solution

    Spreadsheet

    You can download a solution spreadsheet. The top 13 rows are shown below:

    plot of chunk unnamed-chunk-3

    R

    x = c(563, 561, 666, 615, 622, 590, 527, 614, 612, 577, 625, 557, 606, 601, 554, 603, 628, 569, 578, 650, 568, 631)
    n = length(x)
    xbar = mean(x)
    s = sd(x)
    SE = s/sqrt(n)
    gamma = 0.96   # Probability T is between -tstar and tstar
    cumulative = gamma+(1-gamma)/2   # Probability T is less than tstar
    tstar = qt(cumulative, n-1)
    ME = tstar*s/sqrt(n)
    LB = xbar-ME
    UB = xbar+ME
    print(data.frame(n,xbar,s,SE,tstar,ME,LB,UB,row.names=""))
      n     xbar        s       SE    tstar      ME       LB       UB
     22 596.2273 34.70273 7.398646 2.189427 16.1988 580.0285 612.4261

    1. The sample size =n=22=n=22.
    2. The sample mean =x=xn=596.23=\bar{x}=\frac{\sum x}{n}=596.23.
    3. The sample standard deviation =s=(xx)2n1=34.7=s=\sqrt{\frac{\sum(x-\bar{x})^2}{n-1}}=34.7.
    4. The estimated standard error =SE=sn=34.722=7.4=\text{SE}=\frac{s}{\sqrt{n}}=\frac{34.7}{\sqrt{22}}=7.4.
    5. You can show, when n=22n=22, then df=21\text{df}=21, and P(|T|<2.19)=0.96P(|T|<2.19)=0.96, so the boundary =t=2.19=t^\star=2.19.
    6. The estimated margin of error =ME=tsn=(2.19)(34.7)22=16.2=\text{ME}=\frac{t^\star s}{\sqrt{n}}=\frac{(2.19)(34.7)}{\sqrt{22}}=16.2.
    7. The lower boundary of the confidence interval =LB=xME=596.2316.2=580.03=\text{LB}=\bar{x}-\text{ME}=596.23-16.2=580.03.
    8. The upper boundary of the confidence interval =UB=x+ME=596.23+16.2=612.43=\text{UB}=\bar{x}+\text{ME}=596.23+16.2=612.43.

  114. Question

    When Archie shoots archery, she records the horizontal position (xx) and vertical position (yy) of every arrow (in millimeters), using the bullseye as the origin.

    From years of shooting, Archie has determined that XX and YY are roughly bell-shaped with population standard deviations σx=96\sigma_x = 96 and σy=100\sigma_y=100 mm. However, Archie has a new sight, so her current population means (μx\mu_x and μy\mu_y) are unknown. (She hopes both are zero.)

    Archie wants to get confidence intervals for μx\mu_x and μy\mu_y, using a sample size n=36n=36 and a confidence level γ=0.98\gamma=0.98. Whether or not a confidence interval straddles 0 will determine whether Archie adjusts that aspect of her sight.

    Critical value

    Determine the boundary zz^\star such that P(|Z|<z)=0.98P(|Z|<z^\star)=0.98:

    Margins of error.

    The margin of error represents how much variation we expect in sample means. MEx=zσxn\text{ME}_x = \frac{z^\star \sigma_x}{\sqrt{n}} MEy=zσyn\text{ME}_y = \frac{z^\star \sigma_y}{\sqrt{n}} Calculate MEx\text{ME}_x, the margin of error when sampling x\bar{x}:

    Calculate MEy\text{ME}_y, the margin of error for sampling y\bar{y}:

    Sample

    Archie shoots n=36n=36 arrows.

    plot of chunk unnamed-chunk-1

    The exact positions can be downloaded as a csv.

    Sample means

    Calculate x\bar{x}, the horizontal sample mean: Calculate y\bar{y}, the vertical sample mean:

    Confidence intervals

    Calculate the 98% confidence interval of μx\mu_x by using x±MEx\bar{x}\pm\text{ME}_x.

    Calculate the lower boundary:

    Calculate the upper boundary:

    Calculate the 98% confidence interval of μy\mu_y by using y±MEy\bar{y}\pm\text{ME}_y.

    Calculate the lower boundary:

    Calculate the upper boundary:

    Inference

    If a confidence interval straddles 0, Archie will leave that aspect alone. If a confidence interval does not straddle 0, Archie will adjust that aspect on the sight.

    Does Archie adjust the horizontal aspect of her sight? Yes / No

    Does Archie adjust the vertical aspect of her sight? Yes / No



    Solution

    1. 2.33
    2. 37.2
    3. 38.8
    4. -49.2
    5. 20.4
    6. -86.4
    7. -12
    8. -18.3
    9. 59.2
    10. The confidence interval of μx\mu_x does not contain 0. So Archie does adjust the horizontal aspect of her sight. / .
    11. The confidence interval of μy\mu_y does contain 0. So Archie does not adjust the vertical aspect of her sight. / .

  115. Question

    A basketball player has decided to estimate her probability to score a freethrow. To do this, she shoots freethrows. If she scores, she records a “1”. If she misses, she records a “0”.

    The results are shown below and can be downloaded as a csv.

    1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

    You are asked to determine a confidence interval using a given confidence level: γ=0.96\gamma=0.96. To do this, you determine zz^\star such that P(|Z|<z)=0.96P(|Z|<z^\star)=0.96. You also calculate the sample size and sample proportion. Then, you use the following formulas: LB=p̂zp̂(1p̂)n\text{LB} = \hat{p}-z^\star \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} UB=p̂+zp̂(1p̂)n\text{UB} = \hat{p}+z^\star \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}


    1. Determine the lower boundary of the confidence interval
    2. Determine the upper boundary of the confidence interval

    Solution

    You should be able to determine that P(|Z|<2.05)=0.96P(|Z|<2.05)=0.96, so z=2.05z^\star=2.05. Using a spreadsheet or R, you should determine that n=140n=140 and p̂=107140=0.764\hat{p} = \frac{107}{140} = 0.764. Then, use the given formulas.

    R

    gamma = 0.96
    zstar = qnorm(0.5+gamma/2)
    x = c(1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    n = length(x)
    phat = mean(x)
    LB = phat-zstar*sqrt(phat*(1-phat)/n)
    UB = phat+zstar*sqrt(phat*(1-phat)/n)
    print(data.frame(LB,UB,row.names=""))
    ##         LB       UB
    ##  0.6906134 0.837958

    Spreadsheet

    You can download a solution spreadsheet.

    The first 10 lines are shown here:

    plot of chunk unnamed-chunk-4


    1. The lower boundary: LB=p̂zp̂(1p̂)n=(0.764)(2.05)(0.764)(10.764)140=0.691\begin{align} \text{LB} &= \hat{p}-z^\star\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\ &= (0.764)-(2.05)\sqrt{\frac{(0.764)(1-0.764)}{140}}\\ &= 0.691 \end{align}
    2. The upper boundary: UB=p̂+zp̂(1p̂)n=(0.764)+(2.05)(0.764)(10.764)140=0.838\begin{align} \text{UB} &= \hat{p}+z^\star\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\ &= (0.764)+(2.05)\sqrt{\frac{(0.764)(1-0.764)}{140}}\\ &= 0.838 \end{align}

  116. Question

    We can approximate a 95% confidence interval (where the 95% refers to the confidence level: how frequently these intervals straddle the population mean) by using x±2sn\bar{x}\pm \frac{2 s}{\sqrt{n}}. lower boundary=x2sn\text{lower boundary} = \bar{x}-\frac{2s}{\sqrt{n}} upper boundary=x+2sn\text{upper boundary} = \bar{x}+\frac{2s}{\sqrt{n}} where x\bar{x} is the sample mean, ss is the sample standard deviation, and nn is the sample size. The 2 comes from the fact that 95% of normal measurements land within 2 standard deviations from the mean.

    The quantity that we subtract from or add to the sample mean is called the margin of error. margin of error=2sn\text{margin of error} = \frac{2s}{\sqrt{n}}

    When using a confidence level of 0.95, knowing ss will be approximately 65, and wanting the margin of error to be approximately 2.2, how large does the sample size need to be?

    You can round your answer to two significant digits.


    Solution

    Do some algebra. ME=2sn\text{ME} = \frac{2s}{\sqrt{n}} Multiply both sides by n\sqrt{n}. MEn=2s\text{ME}\sqrt{n} = 2s Divide both sides by ME\text{ME}. n=2sME\sqrt{n} = \frac{2s}{\text{ME}} Square both sides. n=(2sME)2n = \left(\frac{2s}{\text{ME}}\right)^2 Plug in numbers. n=((2)(65)2.2)2n = \left(\frac{(2)(65)}{2.2}\right)^2 Evaluate. n=3492n = 3492

    The tolerance is ±100\pm 100, so you can round to 2 significant digits, giving n=3500n=3500.


  117. Question

    We can approximate a 95% confidence interval of a proportion (where the 95% refers to how frequently these intervals straddle the population proportion) by using p̂±2p̂(1p̂)n\hat{p} \pm \frac{2 \sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}}. lower boundary=p̂2p̂(1p̂)n\text{lower boundary} = \hat{p} - \frac{2 \sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}} upper boundary=p̂+2p̂(1p̂)n\text{upper boundary} = \hat{p} + \frac{2 \sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}} where p̂\hat{p} is the sample proportion and nn is the sample size. The 2 comes from the fact that 95% of normal measurements land within 2 standard deviations from the mean.

    The quantity that we subtract from or add to the sample proportion is called the margin of error. margin of error=2p̂(1p̂)n\text{margin of error} = \frac{2 \sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}}

    If we know p̂\hat{p} will be approximately 0.57, and we want the margin of error to be approximately 0.0043, then how large does the sample size need to be?

    You can round your answer to two significant digits.


    Solution

    Do some algebra. ME=2p̂(1p̂)nME = \frac{2 \sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}} Multiply both sides by n\sqrt{n}. MEn=2p̂(1p̂)ME\sqrt{n} = 2\sqrt{\hat{p}(1-\hat{p})} Divide both sides by MEME. n=2p̂(1p̂)ME\sqrt{n} = \frac{2\sqrt{\hat{p}(1-\hat{p})}}{ME} Square both sides. n=(2p̂(1p̂)ME)2n = \left(\frac{2\sqrt{\hat{p}(1-\hat{p})}}{ME}\right)^2 Simplify. n=4p̂(1p̂)ME2n = \frac{4\hat{p}(1-\hat{p})}{ME^2}

    Plug in numbers. n=4(0.57)(0.43)0.00432n = \frac{4(0.57)(0.43)}{0.0043^2} Evaluate. n=53023n = 53023

    The tolerance is ±1000\pm 1000, so you can round to 2 significant digits, giving n=53000n=53000.


  118. Question

    A scientist is investigating whether a chemical may effect the growth of an organism. Under the control conditions (no chemical), the organism grows to a mean mass of μ0=35.1\mu_0 = 35.1 grams with a standard deviation of σ0=10.6\sigma_0 =10.6 grams. These values are known precisely because the organism has been grown under control conditions many many times.

    The scientist has only grown the organism under experimental conditions (with chemical) n=47n=47 times. In that sample, the masses have a mean x=32.46\bar{x}=32.46.

    The scientist wonders if this sample mean is significantly different from μ0\mu_0. To investigate this, the scientist will determine the pp-value. The pp-value represents the probability of getting a sample mean as far (or farther) from μ0\mu_0 due to chance alone. p-value=P(|Z|>|xμ0|σ0/n)p\text{-value} ~=~ P\left(\big|Z\big| > \frac{\big|\bar{x}-\mu_0\big|}{\sigma_0/\sqrt{n}} \right)

    It is common to compare the pp-value to 0.05.


    1. Determine the pp-value.
    2. The difference is significant. The chemical seems to alter the growth of the organism. / The difference is not significant. We don’t know whether the chemical alters the growth of the organism.

    Solution

    We need to calculate the pp-value. p-value=P(|Z|>|xμ0|σ0/n)=P(|Z|>|32.4635.1|10.6/47)=P(|Z|>1.7074461)=0.0877392\begin{aligned} p\text{-value} &= P\left(\big|Z\big| > \frac{\big|\bar{x}-\mu_0\big|}{\sigma_0/\sqrt{n}} \right) \\\\ &= P\left(\big|Z\big| > \frac{\big|32.46-35.1\big|}{10.6/\sqrt{47}} \right) \\\\ &= P\left(\big|Z\big| > 1.7074461 \right) \\\\ &= 0.0877392 \end{aligned}

    Of course you could have rounded the zz score. Either of the following will also get credit.

    p-value=(|Z|>1.71)=0.0872p\text{-value} ~=~ \left(\big|Z\big| > -1.71 \right) ~=~ 0.0872 p-value=(|Z|>1.7)=0.0892p\text{-value} ~=~ \left(\big|Z\big| > -1.7 \right) ~=~ 0.0892


    1. 0.0877392
    2. FALSE / TRUE

  119. Question

    A scientist is investigating whether a chemical may effect the survival rate of an organism. Under the control conditions (no chemical), the organism has a survival rate of p0=0.5846p_0=0.5846. This value is known precisely because the organism has been grown under control conditions many many times.

    The scientist has only grown the organism under experimental conditions (with chemical) n=65n=65 times. In that sample, the survival rate is p̂=0.7231\hat{p} = 0.7231.

    The scientist wonders if this survival rate is significantly different from p0p_0. To investigate this, the scientist will determine the pp-value. The pp-value represents the probability of getting a sample proportion as far (or farther) from p0p_0 due to chance alone. p-value=P(|Z|>|p̂p0|p0(1p0)n)p\text{-value} ~=~ P\left(\big|Z\big| > \frac{\big|\hat{p}-p_0\big|}{\sqrt{\frac{p_0(1-p_0)}{n}}} \right)

    It is common to compare the pp-value to 0.05.


    1. Determine the pp-value.
    2. The difference is significant. The chemical seems to alter the growth of the organism. / The difference is not significant. We don’t know whether the chemical alters the growth of the organism.

    Solution

    We need to calculate the pp-value. p-value=P(|Z|>|p̂p0|p0(1p0)n)=P(|Z|>|0.72310.5846|(0.5846)(0.4154)65)=P(|Z|>2.265916)=0.0234565\begin{aligned} p\text{-value} &= P\left(\big|Z\big| > \frac{\big|\hat{p}-p_0\big|}{\sqrt{\frac{p_0(1-p_0)}{n}}} \right) \\\\ &= P\left(\big|Z\big| > \frac{\big|0.7231-0.5846\big|}{\sqrt{\frac{(0.5846)(0.4154)}{65}}} \right) \\\\ &= P\left(\big|Z\big| > 2.265916 \right) \\\\ &= 0.0234565 \end{aligned}

    Of course you could have rounded the zz score. Either of the following will also get credit.

    p-value=(|Z|>2.26)=0.0238p\text{-value} ~=~ \left(\big|Z\big| > 2.26 \right) ~=~ 0.0238 p-value=(|Z|>2.27)=0.0232p\text{-value} ~=~ \left(\big|Z\big| > 2.27 \right) ~=~ 0.0232


    1. 0.0234565
    2. TRUE / FALSE

  120. Question

    Archie worries the horizontal aspect of her sight may be off. She decides to run a one-sample t test on the xx values of her next 50 arrows. She will use a significance level of α=0.05\alpha=0.05. If the result has statistical significance, Archie will adjust her sight.

    Result

    plot of chunk bullseye

    The horizontal positions are shown below and can be downloaded as a csv.

    -24, 13, 39, 50, 13, -29, 50, -41, 28, 110, 36, -10, 99, -108, -43, 28, -9, 23, -99, -61, -56, 3, 79, -38, 52, -100, -174, -38, -36, -77, 89, -21, -138, -20, -50, -93, -58, 12, -74, 65, -33, 47, -102, 121, -16, 9, -73, 37, -9, -136

    Analysis

    To get the pp-value: p-value=P(|T|>|xμ0|s/n)p\text{-value} ~=~ P\left(\big|T\big| > \frac{\big|\bar{x}-\mu_0\big|}{s/\sqrt{n}} \right)

    In this situation, the null population mean is zero. μ0=0\mu_0=0 In other words, the null hypothesis claims the sight is correct and the difference between x\bar{x} and 0 is just due to chance. The alternative hypothesis claims the sight needs to be adjusted: the difference between x\bar{x} and 0 is partly because the sight is off.

    Determine the pp-value.

    Archie will adjust her sight if the pp-value is less than 0.05, because this indicates there is statistical significance. Does Archie adjust her sight? Yes. / No.



    Solution

    Determine the sample statistics (x\bar{x} and ss). I would recommend using a spreadsheet or R (or another computer-based method). x=15.26\bar{x} = -15.26 s=66.92s = 66.92 Evaluate the tt statistic.

    t=|15.260|(66.92)/50=1.61t = \frac{|-15.26-0|}{(66.92)/\sqrt{50}} = 1.61 Restate the pp-value. p-value=P(|T|>1.61)p\text{-value} = P\left(|T|>1.61\right) Remember, the degree of freedom is one less than the sample size. df=49\text{df} = 49

    At this point, there are various ways to determine the pp-value. The least accurate way is to use the tt-table. Go to the row with df=n1=501=49\text{df}=n-1=50-1=49.

    plot of chunk unnamed-chunk-4

    If we go to the row with df=49, we can see that our calculated tt, 1.61, is between 1.3 and 1.68. Thus, we know P(|T|>1.61)P(|T|>1.61) is between 0.1 and 0.2. You could get pretty close by using linear interpolation. P(|T|>1.61)(1.611.31.681.3)(0.10.2)+0.2=0.118P(|T|>1.61)\approx \left(\frac{1.61-1.3}{1.68-1.3}\right)\left(0.1-0.2\right)+0.2 = 0.118

    The more accurate pp-value is 0.1132777. This can be calculated using a computer; for example, you can use this web app (for full accuracy, you’ll need the more precise value of t=1.6124931t=1.6124931). You can also use a spreadsheet or R.

    Spreadsheet

    plot of chunk unnamed-chunk-7

    R

    You can do this problem VERY quickly with R.

    x = c(-24,13,39,50,13,-29,50,-41,28,110,36,-10,99,-108,-43,28,-9,23,-99,-61,-56,3,79,-38,52,-100,-174,-38,-36,-77,89,-21,-138,-20,-50,-93,-58,12,-74,65,-33,47,-102,121,-16,9,-73,37,-9,-136)
    t.test(x)
    ## 
    ##  One Sample t-test
    ## 
    ## data:  x
    ## t = -1.6125, df = 49, p-value = 0.1133
    ## alternative hypothesis: true mean is not equal to 0
    ## 95 percent confidence interval:
    ##  -34.277829   3.757829
    ## sample estimates:
    ## mean of x 
    ##    -15.26


  121. Question

    Connie suspects a coin may be unfair when spun on its edge on a table. She decides to record some spins, using “0” for tails and “1” for heads. After those spins, she will run a one-proportion hypothesis test using a significance level α=0.05\alpha=0.05.

    The data is shown below, and can be downloaded as a csv.

    1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0

    The null hypothesis states the coin is fair and any deviation between the sample proportion and 0.5 is merely due to chance and natural variation. Thus, the null population proportion is p0=0.5p_0=0.5.

    The alternative hypothesis states the coin is unfair, so the deviation between the sample proportion and 0.5 is at least partly due to the unfairness of the coin.

    The pp-value will indicate the probability that a fair coin produces a sample mean as extreme (or more extreme) in its deviation from 0.5. An approximate formula, using a normal approximation of the proportion sampling and not using a continuity correction, is given:

    p-valueP(|Z|>|p̂p0|p0(1p0)/n)p\text{-value} \approx P\left(|Z| > \frac{|\hat{p}-p_0|}{\sqrt{p_0(1-p_0)/n}} \right) If you want to make a continuity correction, your pp-value is more accurate (and larger, more conservative). p-valueP(|Z|>(|p̂p0|n0.5)/np0(1p0)/n)p\text{-value} \approx P\left(|Z| > \frac{(|\hat{p}-p_0|\cdot n-0.5)/n}{\sqrt{p_0(1-p_0)/n}} \right)

    And, if you want to be exactly correct, you need a computer (or a lot of time) to use the binomial distribution formulas.

    p-value=P(|B(n,p0)p0n||nsp0n|)p\text{-value} = P\left(\left|B(n,p_0)-p_0n\right|\ge \left|n_\text{s}-p_0n\right|\right)

    For this problem, you can use any of those three strategies.

    Determine the pp-value.

    Connie compares the pp-value to the significance level, α=0.05\alpha=0.05. If the pp-value is less than 0.05, Connie concludes the coin is unfair. Otherwise, Connie will conclude the coin MIGHT be fair but future measurements may still show the coin is unfair.

    Does Connie conclude the coin is unfair?

    Yes, the sample proportion is significantly far from 0.5, so Connie thinks the coin is unfair. / No, the sample proportion is NOT significantly far from 0.5, so Connie retains the belief that the coin MIGHT be fair.



    Solution

    First, I apologize for the notation, but for some reason this notation is ubiquitous. We have 4 different “p” variables. p-value=probability that null produces result as extreme (or more extreme)p\text{-value} = \text{probability that null produces result as extreme (or more extreme)} P()=general probability notation, with description in parenthesesP() = \text{general probability notation, with description in parentheses} p0=the population proportion of the null hypothesisp_0 = \text{the population proportion of the null hypothesis} p̂=the sample proportion\hat{p} = \text{the sample proportion}

    You need to determine the sample size (nn) and sample proportion (p̂\hat{p}).

    n=177n = 177 p̂=99177=0.5593\hat{p} = \frac{99}{177} = 0.5593

    You then calculate a zz-score. For simplicity, we will not make the continuity correction if we are doing this by hand.

    z=|0.55930.5|0.5(10.5)/n=1.58z = \frac{|0.5593-0.5|}{\sqrt{0.5\cdot(1-0.5)/n}} = 1.58

    Restate the pp-value. p-value=P(|Z|>1.58)p\text{-value} = P(|Z|>1.58)

    You can use a zz table to determine the cumulative probability of z=1.58z=1.58. P(Z<1.58)=0.9429P(Z<1.58) = 0.9429

    To calculate the pp-value, you need to remember how to determine a two-tail probability from a cumulative (leftward) probability. p-value=2(10.9429)=0.1142p\text{-value} = 2\cdot(1-0.9429) = 0.1142

    Spreadsheet

    You can use any of the three methods shown in the spreadsheet:

    plot of chunk unnamed-chunk-2

    You can download a solution spreadsheet.

    R

    All three methods can be done easily in R:

    x = c(1,0,1,0,1,1,1,0,1,1,0,1,1,0,0,0,1,1,1,0,0,1,1,0,1,1,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,1,1,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,1,1,0,0,0,1,1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,0,0,0,0,1,0,1,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,1,1,0,0,1,0,1,0,1,0,0,0,0,0,1,1,0,0,0,1,1,0,1,1,1,0,0,1,0,0,1,0,0,1,0,1,1,1,1,1,0,0,0,1,1,0,0,1,1,1,1,0,1,1,0,0,1,0,1,0,0)
    prop.test(sum(x),length(x),0.5,correct=F)
    ## 
    ##  1-sample proportions test without continuity correction
    ## 
    ## data:  sum(x) out of length(x), null probability 0.5
    ## X-squared = 2.4915, df = 1, p-value = 0.1145
    ## alternative hypothesis: true p is not equal to 0.5
    ## 95 percent confidence interval:
    ##  0.4856922 0.6304316
    ## sample estimates:
    ##        p 
    ## 0.559322
    prop.test(sum(x),length(x),0.5)
    ## 
    ##  1-sample proportions test with continuity correction
    ## 
    ## data:  sum(x) out of length(x), null probability 0.5
    ## X-squared = 2.2599, df = 1, p-value = 0.1328
    ## alternative hypothesis: true p is not equal to 0.5
    ## 95 percent confidence interval:
    ##  0.4828804 0.6331471
    ## sample estimates:
    ##        p 
    ## 0.559322
    binom.test(sum(x),length(x),0.5)
    ## 
    ##  Exact binomial test
    ## 
    ## data:  sum(x) and length(x)
    ## number of successes = 99, number of trials = 177, p-value =
    ## 0.1325
    ## alternative hypothesis: true probability of success is not equal to 0.5
    ## 95 percent confidence interval:
    ##  0.4828872 0.6337364
    ## sample estimates:
    ## probability of success 
    ##               0.559322


  122. Question

    A study asked individuals to time a mile run (in seconds). After a month, the same individuals timed another mile run. You are asked to perform a paired-data tt test to investigate whether fitness changed.

    name x1 x2
    Charlotte 654 641
    Emani 751 717
    Hudson 305 253
    Jaelani 305 229
    Jayden 945 919
    Julianne 499 557
    Lea 362 285
    Luke 608 672
    Ruby 485 402
    Zaina 499 415

    (download as run_times.csv)

    To do this, first determine the list of differences. For each individual, ii, determine their difference did_i. di=x2,ix1,id_i = x_{2,i}-x_{1,i}

    Then, determine the mean (d\bar{d}) and standard deviation (sds_d) of the differences. The tt score is then calculated to determine the pp-value.

    p-value=P(|T|>|d¯μ0|sd/n)p\text{-value} = P\left(|T|>\frac{|\overline{d}-\mu_0|}{s_d/\sqrt{n}}\right) The null hypothesis predicts there is no change in fitness, so μ0=0\mu_0=0. The alternative hypothesis predicts a change in fitness. The degree of freedom is one less than the number of individuals.

    Calculate the pp-value.

    Using a significance level of α=0.05\alpha=0.05, is there a significant change in run times?

    Yes / No



    Solution

    You first need to determine a list of differences.

    name x1 x2 d = x2-x1
    Charlotte 654 641 -13
    Emani 751 717 -34
    Hudson 305 253 -52
    Jaelani 305 229 -76
    Jayden 945 919 -26
    Julianne 499 557 58
    Lea 362 285 -77
    Luke 608 672 64
    Ruby 485 402 -83
    Zaina 499 415 -84

    Determine the sample size and degrees of freedom. n=10n = 10 df=9\text{df} = 9 Determine the sample mean of the differences. d=dn=32.31\bar{d} = \frac{\sum d}{n}= -32.31 Determine the standard deviation of the differences. You probably do not want to do this by hand. sd=(dd)2n1=55.35s_d = \sqrt{\frac{\sum (d-\bar{d})^2}{n-1}} = 55.35 Calculate the tt-score. t=|d¯0|sd/n=32.3155.3510=1.85t = \frac{|\overline{d}-0|}{s_d/\sqrt{n}} = \frac{32.31}{55.35\sqrt{10}} = 1.85 Restate the pp-value. p-value=P(|T|>1.85)p\text{-value} = P\left(|T|>1.85\right) I would recommend using a computer program to determine this pp-value, like a spreadsheet or R.

    p-value=0.098p\text{-value} = 0.098

    The pp-value is more than 0.05, so the result is NOT significant.

    spreadsheet

    The solution spreadsheet can be downloaded as a csv. The first 10 rows are shown below.

    plot of chunk solss

    If your spreadsheet does not have the TDIST function, you can try T.DIST.2T(G6,G3,1). Also, notice that T.TEST does everything for you, so you can just use that.

    R

    Make/use a directory (folder) for this problem (paired-data tt test), and set the working directory accordingly. Save run_times.csv to your working directory. I would recommend saving the following script as paired_data_hypothesis_test.r, in the same directory.

    table = read.csv("run_times.csv")
    x1 = table[['x1']]
    x2 = table[['x2']]
    t.test(x1,x2,paired=T)
    ## 
    ##  Paired t-test
    ## 
    ## data:  x1 and x2
    ## t = 1.8518, df = 9, p-value = 0.09707
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##  -7.157984 71.757984
    ## sample estimates:
    ## mean of the differences 
    ##                    32.3

    Or, if you wanted to do it the long way:

    table = read.csv("run_times.csv")
    x1 = table[["x1"]]
    x2 = table[["x2"]]
    d = x2-x1
    n = length(d)
    t = abs(mean(d))/(sd(d)/sqrt(n))
    cumulative = pt(t,n-1)
    pvalue = 2*(1-cumulative)
    pvalue
    ## [1] 0.09707493


  123. Question

    A doctor runs a controlled experiment. The participants are randomly assigned to two groups: control and treatment. The participants in the control group are given a placebo. The participants in the treatment group are given a drug.

    After a month, each participant’s triglyceride level (in mg/dL) is measured. Those measurements are shown below. They can also be downloaded as a csv.

    ## Control: 262.8, 276.4, 243, 237.9, 191.4, 264.4, 202.7, 235.1, 299.3, 300.3, 243.3, 316.9, 249, 332.3, 243.9, 222.5, 203.5, 290.9, 263.4, 271, 212.5
    ## 
    ## Treatment: 372.5, 247.7, 278.5, 279.2, 313.1, 242.6, 235.5, 259.4, 270.9, 208.1, 219.2, 245.3, 271.1, 267.7, 338.5, 310.1, 291.8, 243.6, 289.1, 294.4, 289.9, 285.8

    You are asked to perform a two-tail two-sample Welch’s tt test to determine whether there is a significant difference of means in the two samples.

    To do this by hand, you would first determine the absolute tt score as defined here. |t|=|x¯1x¯2|s12n1+s22n2|t| = \frac{|\overline{x}_1-\overline{x}_2|}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}} You’d also need to calculate the degrees of freedom.

    df=(s12n1+s22n2)2s14n12(n11)+s24n22(n21)\text{df} = \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2(n_1-1)}+\frac{s_2^4}{n_2^2(n_2-1)}} And then, the pp-value:

    p-value=P(|T|>|t|)p\text{-value} = P\left(|T|>|t|\right)

    However, this problem is easy when using a spreadsheet or R, so I would recommend using one of those tools.

    In a spreadsheet, you can use T.TEST with mode=2 for a two-tail test and type=3 for Welch’s test.

    In R, you can use t.test with the default settings.

    Determine the pp-value.

    Is the difference of means significant (using a significance level of 0.05)?

    Yes, the drug causes a difference in average triglyceride level. / No, we don’t know whether the drug causes a difference.



    Solution

    To do this by hand, you first calculate the sample statistics. n1=21n_1 = 21 n2=22n_2 = 22 x1=255.4\bar{x}_1 = 255.4 x2=275.2\bar{x}_2 = 275.2 s1=38.4s_1 = 38.4 s2=38.2s_2 = 38.2 |t|=1.7|t| = 1.7 df=40.884\text{df} = 40.884 You will get quite close if you round df down (floor).

    Then, using a computer application or a tt table, you should be able to determine the following probabilities (using interpolation to estimate if using tt table). P(T<1.7)=0.9514P(T<1.7) = 0.9514 P(|T|>1.7)=0.0972P(|T|>1.7) = 0.0972 p-value=0.0972p\text{-value} = 0.0972

    Spreadsheet

    You just need to use T.TEST with the proper settings. You can download the solution as a csv.

    plot of chunk unnamed-chunk-4

    R

    You just use t.test with the default settings. The hardest part is getting the data imported. You can do this in 2 ways: copy/paste or read.csv.

    copy/paste

    x1 = c(262.8,276.4,243,237.9,191.4,264.4,202.7,235.1,299.3,300.3,243.3,316.9,249,332.3,243.9,222.5,203.5,290.9,263.4,271,212.5)
    x2 = c(372.5,247.7,278.5,279.2,313.1,242.6,235.5,259.4,270.9,208.1,219.2,245.3,271.1,267.7,338.5,310.1,291.8,243.6,289.1,294.4,289.9,285.8)
    t.test(x1,x2)
    ## 
    ##  Welch Two Sample t-test
    ## 
    ## data:  x1 and x2
    ## t = -1.6977, df = 40.884, p-value = 0.09717
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##  -43.409861   3.760511
    ## sample estimates:
    ## mean of x mean of y 
    ##  255.3571  275.1818

    read.csv

    Download triglyceride.csv and move it to a directory (folder). Make a script, welchttest.r, and put it in the same directory. Set the working directory to this directory. Then, run the script.

    ### welchttest.r
    data = read.csv("triglyceride.csv")
    x1 = data$x1
    x2 = data$x2
    t.test(x1,x2)
    ## 
    ##  Welch Two Sample t-test
    ## 
    ## data:  x1 and x2
    ## t = -1.6977, df = 40.884, p-value = 0.09717
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##  -43.409861   3.760511
    ## sample estimates:
    ## mean of x mean of y 
    ##  255.3571  275.1818


  124. Question

    A doctor runs a controlled experiment. The sick patients are randomly assigned to two groups: control and treatment. The patients in the control group are given a placebo. The patients in the treatment group are given a drug.

    After a month, each patient was checked for whether they recovered from the sickness. A “0” means no recovery while a “1” means recovery. This data can also be downloaded as a csv.

    ## Control: 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1
    ## 
    ## Treatment: 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0

    You are asked to perform a two-tailed two-proportion test. You can do a two-proportions zz test (which is equivalent to a 2x2 χ2\chi^2 test [chi squared test]). You will get credit whether or not you apply a continuity correction, but please pool data for the standard error estimation.

    Just for completeness, you would also get credit for using Fisher’s exact test.

    Determine the pp-value.

    Is the difference of means significant (using a significance level of 0.05)?

    Yes, the drug causes a difference in recovery. / No, we don’t know whether the drug causes a difference.



    Solution

    There are many ways to do this problem. I will show the following:

    “By hand”

    First, somehow you need to determine the sample sizes (n1n_1 and n2n_2) and the sample totals (numbers of recovery), x1\sum x_1 and x2\sum x_2. You could count… but a computer is probably helpful.

    n1=76n_1 = 76 n2=57n_2 = 57 x1=53\sum x_1 = 53 x2=31\sum x_2 = 31

    It can be helpful to organize these summary statistics into a contingency table.

    Recover Not_recover TOTAL
    Control 53 23 76
    Treatment 31 23 57
    TOTAL 84 46 133

    Calculate the proportions. p̂1=5376=0.6973684\hat{p}_1 = \frac{53}{76} = 0.6973684 p̂2=3157=0.5438596\hat{p}_2 = \frac{31}{57} = 0.5438596 p̂=53+3176+57=0.6315789\hat{p} = \frac{53+31}{76+57} = 0.6315789

    Determine the absolute zz score.

    |z|=|p̂2p̂1|p̂(1p̂)(1n1+1n2)=1.8162079|z| = \frac{\left|\hat{p}_2-\hat{p}_1\right|}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} =1.8162079

    Using a zz table or an online standard-normal probability tool, determine the appropriate probabilities.

    P(Z<1.82)=0.9653308P(Z<1.82) = 0.9653308 P(|Z|>1.82)=(2)(10.9653)=0.0693385P(|Z|>1.82) = (2)(1-0.9653) = 0.0693385 p-value=0.0693385p\text{-value} = 0.0693385 This pp-value did not use the continuity correction.

    Spreadsheet

    Solution download.

    plot of chunk unnamed-chunk-3

    If you wanted to make Yates’ continuity correction, you would need to calculate the EXPECTED table the same, but then make another table, where the observed values are all 0.5 closer to the expected values. Then, you would use =CHISQ.TEST on these new values (as the “observed”) and the expected values.

    It looks challenging to do a Fisher exact test in a spreadsheet.

    R

    x1 = c(1,1,0,1,1,1,1,0,1,1,0,1,0,0,1,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,0,0,0,1,0,1,1,1,1,0,1,1,0,1,0,0,1,1,1,1,1,1,1,1,1,0,1,0,1,0,1,1,0,1,1,1,1)
    x2 = c(0,0,0,0,0,0,1,1,1,1,1,1,0,1,0,1,1,0,0,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,1,0,0,0,0,1,1,0,1,1,1,1,0,1,0,1,1,0,1,0,1,0,0)
    n1 = length(x1)
    n2 = length(x2)
    ns1 = sum(x1) #number of successes in sample 1
    ns2 = sum(x2)
    prop.test(c(ns1,ns2),c(n1,n2))
    ## 
    ##  2-sample test for equality of proportions with continuity
    ##  correction
    ## 
    ## data:  c(ns1, ns2) out of c(n1, n2)
    ## X-squared = 2.6719, df = 1, p-value = 0.1021
    ## alternative hypothesis: two.sided
    ## 95 percent confidence interval:
    ##  -0.02733009  0.33434763
    ## sample estimates:
    ##    prop 1    prop 2 
    ## 0.6973684 0.5438596
    prop.test(c(ns1,ns2),c(n1,n2),correct=F)
    ## 
    ##  2-sample test for equality of proportions without continuity
    ##  correction
    ## 
    ## data:  c(ns1, ns2) out of c(n1, n2)
    ## X-squared = 3.2986, df = 1, p-value = 0.06934
    ## alternative hypothesis: two.sided
    ## 95 percent confidence interval:
    ##  -0.01197921  0.31899676
    ## sample estimates:
    ##    prop 1    prop 2 
    ## 0.6973684 0.5438596
    nf1 = n1-ns1
    nf2 = n2-ns2
    conttab = matrix(c(ns1,nf1,ns2,nf2),nrow=2)
    fisher.test(conttab)
    ## 
    ##  Fisher's Exact Test for Count Data
    ## 
    ## data:  conttab
    ## p-value = 0.1016
    ## alternative hypothesis: true odds ratio is not equal to 1
    ## 95 percent confidence interval:
    ##  0.8892012 4.2017117
    ## sample estimates:
    ## odds ratio 
    ##   1.922894


  125. Question

    An automatic bottle filler is supposed to average 300.00 ml of fluid in each bottle. You sampled some random bottles, recording their volumes:

    298.19, 298.05, 299.35, 297.07, 297.82

    Download data.

    You are asked to determine a 95% confidence interval, calculate an appropriate pp-value (using two-tail tt test), and state whether the filler needs adjustment using a significance level of 0.05.

    Determine the lower boundary of the confidence interval.

    Determine the upper boundary of the confidence interval.

    Determine the pp-value.

    Does the filler need adjustment?

    Yes / No



    Solution

    n=5n=5 df=4\text{df}=4

    t=2.78t^\star = 2.78 x=298.096\bar{x} = 298.096 s=0.8235s = 0.8235

    CIbounds=x±tsn\text{CI}_\text{bounds} = \bar{x}\pm \frac{t^\star s}{\sqrt{n}}

    CIbounds=(297.074, 299.118)\text{CI}_\text{bounds} = \text{(297.074, 299.118)}

    t=xμ0s/n=298.0963000.8235/5t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}=\frac{298.096-300}{0.8235/ \sqrt{5}} t=5.17t = -5.17 p-value=P(|T|>5.17)p\text{-value}= P\left(|T|>5.17\right) p-value=0.007p\text{-value} = 0.007


    1. 297.0735442
    2. 299.1184558
    3. 0.0066508
    4. The p-value is less than 0.05, so the difference is significant. /

  126. Question

    A scratch-off lottery has a stated chance of 0.63 to win. You sampled some tickets, marking a win as “1” and a loss as “0”.

    1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
    1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
    1, 0, 1, 1

    Download data.

    Please determine a 95% confidence interval, calculate an appropriate pp-value, and state whether the sample proportion is significantly different than the stated chance (using a significance level of 0.05).

    For the confidence interval, you can use a normal approximation interval, a Wilson score interval, a Wilson score interval with continuity correction, or the exact Clopper-Pearson interval (see descriptions here).

    For the pp-value, you can use similarly use a zz test, a χ2\chi^2 test (with or without continuity correction), or an exact test.

    Determine the lower boundary of the confidence interval.

    Determine the upper boundary of the confidence interval.

    Determine the pp-value.

    Using a significance level of 0.05, is the sample proportion significantly different from the stated chance?

    Yes / No



    Solution

    By hand, this is easiest to do with a normal approximation without the continuity correction.

    n=64n = 64 p̂=0.7969\hat{p} = 0.7969 z=1.96z^\star = 1.96 CIboundaries=p̂±zp̂(1p̂)n\text{CI}_\text{boundaries} = \hat{p} \pm z^\star\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} CIboundaries=(0.6983054,0.8954446)\text{CI}_\text{boundaries} = (0.6983054,0.8954446) z=p̂p0p0(1p0)/n=2.7650955z = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}} = 2.7650955

    p-value=P(|Z|>2.7650955)=0.0056906p\text{-value} = P\left(|Z|>2.7650955\right) = 0.0056906


    1. Normal approx: 0.69830540.6983054. Wilson: 0.68286360.6828636. Wilson with CC: 0.67424370.6742437. Exact Clopper-Pearson: 0.67773570.6777357.
    2. Normal approx: 0.89544460.8954446. Wilson: 0.87726580.8772658. Wilson with CC: 0.88333590.8833359. Exact Clopper-Pearson: 0.88717110.8871711.
    3. Normal approx: 0.00569060.0056906. Chi squared: 0.00569060.0056906. Chi squared with CC: 0.00839780.0083978. Exact: 0.00608420.0060842.
    4. The p-value is less than 0.05, so the difference is significant. /

  127. Question

    A study asked individuals to time a mile run (in seconds). After a month, the same individuals timed another mile run. You are asked to perform a paired-data tt test to investigate whether fitness changed.

    name x1 x2
    Angel 490 441
    Italia 316 252
    June 337 283
    Kalani 378 362
    Luke 451 408
    Maxwell 623 593
    Sarita 703 657
    Serenity 382 390
    Theodore 279 252
    Zara 431 469

    (download as run_times.csv)

    Please run a paired tt-test to check whether there was a significant change in running times.

    Calculate the pp-value.

    Using a significance level of α=0.05\alpha=0.05, is there a significant change in run times?

    Yes / No



    Solution

    You first need to determine a list of differences.

    name x1 x2 d = x2-x1
    Angel 490 441 -49
    Italia 316 252 -64
    June 337 283 -54
    Kalani 378 362 -16
    Luke 451 408 -43
    Maxwell 623 593 -30
    Sarita 703 657 -46
    Serenity 382 390 8
    Theodore 279 252 -27
    Zara 431 469 38

    Determine the sample size and degrees of freedom. n=10n = 10 df=9\text{df} = 9 Determine the sample mean of the differences. d=dn=28.57\bar{d} = \frac{\sum d}{n}= -28.57 Determine the standard deviation of the differences. You probably do not want to do this by hand. sd=(dd)2n1=31.19s_d = \sqrt{\frac{\sum (d-\bar{d})^2}{n-1}} = 31.19 Calculate the tt-score. t=|d¯0|sd/n=28.5731.1910=2.9t = \frac{|\overline{d}-0|}{s_d/\sqrt{n}} = \frac{28.57}{31.19\sqrt{10}} = 2.9 Restate the pp-value. p-value=P(|T|>2.9)p\text{-value} = P\left(|T|>2.9\right) I would recommend using a computer program to determine this pp-value, like a spreadsheet or R.

    p-value=0.018p\text{-value} = 0.018

    The pp-value is less than 0.05, so the result is significant.

    spreadsheet

    The solution spreadsheet can be downloaded as a csv. The first 10 rows are shown below.

    plot of chunk solss

    If your spreadsheet does not have the TDIST function, you can try T.DIST.2T(G6,G3,1). Also, notice that T.TEST does everything for you, so you can just use that.

    R

    Make/use a directory (folder) for this problem (paired-data tt test), and set the working directory accordingly. Save run_times.csv to your working directory. I would recommend saving the following script as paired_data_hypothesis_test.r, in the same directory.

    table = read.csv("run_times.csv")
    x1 = table[['x1']]
    x2 = table[['x2']]
    t.test(x1,x2,paired=T)
    ## 
    ##  Paired t-test
    ## 
    ## data:  x1 and x2
    ## t = 2.8682, df = 9, p-value = 0.01853
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##   5.979412 50.620588
    ## sample estimates:
    ## mean of the differences 
    ##                    28.3

    Or, if you wanted to do it the long way:

    table = read.csv("run_times.csv")
    x1 = table[["x1"]]
    x2 = table[["x2"]]
    d = x2-x1
    n = length(d)
    t = abs(mean(d))/(sd(d)/sqrt(n))
    cumulative = pt(t,n-1)
    pvalue = 2*(1-cumulative)
    pvalue
    ## [1] 0.01853216


  128. Question

    A doctor runs a controlled experiment. The participants are randomly assigned to two groups: control and treatment. The participants in the control group are given a placebo. The participants in the treatment group are given a drug.

    After a month, each participant’s triglyceride level (in mg/dL) is measured. Those measurements are shown below. They can also be downloaded as a csv.

    ## Control: 500.1, 519.3, 525.1, 514.5, 498.8, 514, 514.3, 515.3, 525.3, 499.6, 529.8, 518.4, 570.1, 517.4, 520.6, 519.3, 514, 506.9, 520.8, 494.6, 531.4, 486.3, 524.7, 520.9
    ## 
    ## Treatment: 562.4, 536, 525, 552.3, 523.9, 533.8, 539, 533.2, 528.6, 529.4, 503.2

    You are asked to perform a two-tail two-sample Welch’s tt test to determine whether there is a significant difference of means in the two samples.

    Determine the pp-value.

    Is the difference of means significant (using a significance level of 0.05)?

    Yes, the drug causes a difference in average triglyceride level. / No, we don’t know whether the drug causes a difference.



    Solution

    To do this by hand, you first calculate the sample statistics. n1=24n_1 = 24 n2=11n_2 = 11 x1=516.7\bar{x}_1 = 516.7 x2=533.3\bar{x}_2 = 533.3 s1=16.1s_1 = 16.1 s2=15.3s_2 = 15.3 |t|=2.93|t| = 2.93 df=20.369\text{df} = 20.369 You will get quite close if you round df down (floor).

    Then, using a computer application or a tt table, you should be able to determine the following probabilities (using interpolation to estimate if using tt table). P(T<2.93)=0.9959P(T<2.93) = 0.9959 P(|T|>2.93)=0.0081P(|T|>2.93) = 0.0081 p-value=0.0081p\text{-value} = 0.0081

    Spreadsheet

    You just need to use T.TEST with the proper settings. You can download the solution as a csv.

    plot of chunk unnamed-chunk-4

    R

    You just use t.test with the default settings. The hardest part is getting the data imported. You can do this in 2 ways: copy/paste or read.csv.

    copy/paste

    x1 = c(500.1,519.3,525.1,514.5,498.8,514,514.3,515.3,525.3,499.6,529.8,518.4,570.1,517.4,520.6,519.3,514,506.9,520.8,494.6,531.4,486.3,524.7,520.9)
    x2 = c(562.4,536,525,552.3,523.9,533.8,539,533.2,528.6,529.4,503.2)
    t.test(x1,x2)
    ## 
    ##  Welch Two Sample t-test
    ## 
    ## data:  x1 and x2
    ## t = -2.9325, df = 20.369, p-value = 0.008128
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##  -28.422293  -4.810283
    ## sample estimates:
    ## mean of x mean of y 
    ##  516.7292  533.3455

    read.csv

    Download triglyceride.csv and move it to a directory (folder). Make a script, welchttest.r, and put it in the same directory. Set the working directory to this directory. Then, run the script.

    ### welchttest.r
    data = read.csv("triglyceride.csv")
    x1 = data$x1
    x2 = data$x2
    t.test(x1,x2)
    ## 
    ##  Welch Two Sample t-test
    ## 
    ## data:  x1 and x2
    ## t = -2.9325, df = 20.369, p-value = 0.008128
    ## alternative hypothesis: true difference in means is not equal to 0
    ## 95 percent confidence interval:
    ##  -28.422293  -4.810283
    ## sample estimates:
    ## mean of x mean of y 
    ##  516.7292  533.3455


  129. Question

    A doctor runs a controlled experiment. The sick patients are randomly assigned to two groups: control and treatment. The patients in the control group are given a placebo. The patients in the treatment group are given a drug.

    After a month, each patient was checked for whether they recovered from the sickness. A “0” means no recovery while a “1” means recovery. This data can also be downloaded as a csv.

    ## Control: 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1
    ## 
    ## Treatment: 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1

    You are asked to perform a two-tailed two-proportion test. You can do a two-proportions zz test (which is equivalent to a 2x2 χ2\chi^2 test [chi squared test]). You will get credit whether or not you apply a continuity correction, but please pool data for the standard error estimation.

    Just for completeness, you can also get credit for using Fisher’s exact test.

    Determine the pp-value.

    Is the difference of means significant (using a significance level of 0.05)?

    Yes, the drug causes a difference in recovery. / No, we don’t know whether the drug causes a difference.



    Solution

    There are many ways to do this problem. I will show the following:

    “By hand”

    First, somehow you need to determine the sample sizes (n1n_1 and n2n_2) and the sample totals (numbers of recovery), x1\sum x_1 and x2\sum x_2. You could count… but a computer is probably helpful.

    n1=79n_1 = 79 n2=72n_2 = 72 x1=40\sum x_1 = 40 x2=50\sum x_2 = 50

    It can be helpful to organize these summary statistics into a contingency table.

    Recover Not_recover TOTAL
    Control 40 39 79
    Treatment 50 39 72
    TOTAL 90 78 151

    Calculate the proportions. p̂1=4079=0.5063291\hat{p}_1 = \frac{40}{79} = 0.5063291 p̂2=5072=0.6944444\hat{p}_2 = \frac{50}{72} = 0.6944444 p̂=40+5079+72=0.5960265\hat{p} = \frac{40+50}{79+72} = 0.5960265

    Determine the absolute zz score.

    |z|=|p̂2p̂1|p̂(1p̂)(1n1+1n2)=2.3529153|z| = \frac{\left|\hat{p}_2-\hat{p}_1\right|}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} =2.3529153

    Using a zz table or an online standard-normal probability tool, determine the appropriate probabilities.

    P(Z<2.35)=0.9906866P(Z<2.35) = 0.9906866 P(|Z|>2.35)=(2)(10.9907)=0.0186269P(|Z|>2.35) = (2)(1-0.9907) = 0.0186269 p-value=0.0186269p\text{-value} = 0.0186269 This pp-value did not use the continuity correction.

    Spreadsheet

    Solution download.

    plot of chunk unnamed-chunk-3

    If you wanted to make Yates’ continuity correction, you would need to calculate the EXPECTED table the same, but then make another table, where the observed values are all 0.5 closer to the expected values. Then, you would use =CHISQ.TEST on these new values (as the “observed”) and the expected values.

    It looks challenging to do a Fisher exact test in a spreadsheet.

    R

    x1 = c(1,0,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,0,1,0,1,1,1,1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0,1,1,1,0,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1)
    x2 = c(1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,1,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,1,1,1,0,1,1,1,1,0,1,1,0,0,1,0,1,1,1,0,1,1,0,1,0,1,1,0,1,1)
    n1 = length(x1)
    n2 = length(x2)
    ns1 = sum(x1) #number of successes in sample 1
    ns2 = sum(x2)
    prop.test(c(ns1,ns2),c(n1,n2))
    ## 
    ##  2-sample test for equality of proportions with continuity
    ##  correction
    ## 
    ## data:  c(ns1, ns2) out of c(n1, n2)
    ## X-squared = 4.7825, df = 1, p-value = 0.02875
    ## alternative hypothesis: two.sided
    ## 95 percent confidence interval:
    ##  -0.35460684 -0.02162383
    ## sample estimates:
    ##    prop 1    prop 2 
    ## 0.5063291 0.6944444
    prop.test(c(ns1,ns2),c(n1,n2),correct=F)
    ## 
    ##  2-sample test for equality of proportions without continuity
    ##  correction
    ## 
    ## data:  c(ns1, ns2) out of c(n1, n2)
    ## X-squared = 5.5362, df = 1, p-value = 0.01863
    ## alternative hypothesis: two.sided
    ## 95 percent confidence interval:
    ##  -0.34133328 -0.03489738
    ## sample estimates:
    ##    prop 1    prop 2 
    ## 0.5063291 0.6944444
    nf1 = n1-ns1
    nf2 = n2-ns2
    conttab = matrix(c(ns1,nf1,ns2,nf2),nrow=2)
    fisher.test(conttab)
    ## 
    ##  Fisher's Exact Test for Count Data
    ## 
    ## data:  conttab
    ## p-value = 0.02092
    ## alternative hypothesis: true odds ratio is not equal to 1
    ## 95 percent confidence interval:
    ##  0.2184936 0.9254738
    ## sample estimates:
    ## odds ratio 
    ##  0.4537105